<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Collective Intelligence &amp; the Semantic Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Preface</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dr. Yannis Avrithis, National Technical University of Athens</institution>
          ,
          <addr-line>Greece Dr. Yiannis Kompatsiaris, CERTH-ITI, Greece Prof. Steffen Staab</addr-line>
          ,
          <institution>University of Koblenz-Landau, Germany Prof. Athena Vakali, Aristotle University of Thessaloniki</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2008</year>
      </pub-date>
      <fpage>78</fpage>
      <lpage>119</lpage>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>http://mklab.iti.gr/CISWeb
This volume includes the papers presented at the 1st International Workshop on “Collective
Semantics: Collective Intelligence &amp; the Semantic Web (CISWeb 2008)”, which was hosted by
the 5th European Semantic Web Conference (ESWC-08), in Tenerife, Spain, June 2nd, 2008.
Web 2.0 technologies have introduced new information sharing practices which favor mass users
participation and aim at improving quality of information content and information organization. It
is challenging to dynamically capture knowledge that emerges as the outcome of the interactions
of masses of users in social networks, since difficulties are posed by the heterogeneous data
sources, the large information scale and the huge amount of information postings. Semantic Web
may contribute by providing language basis, structuring help from distributed ad-hoc ontologies,
and by offering new ways of exploring the information space.</p>
      <p>In this context, CISWeb 2008 Workshop has attracted very interesting work which covers crucial
and emerging research topics such as using and enriching ontologies, semantically enhancing
folksonomies and webspaces, social data management, , and interrelating Web 2.0 to Semantic
Web. More specifically, interesting ideas were presented at the Workshop for ontology matching
via knowledge extracted by multiple ontologies, enriching ontological user profiles with tagging
history, merging Web 2.0 and the Semantic Web by (semi-) automated content tagging,
semantically enriching folsonomies and tagging. Most of these efforts were experimented and
validated under popular datasets and testbeds (such as Wikipedia, Flickr, LycosIQ).
There were 11 submissions from 9 countries, and three reviewers were assigned to each paper.
The program committee has finally selected 5 regular papers and 3 poster papers for presentation
at the workshop. We would like to thank all the program committee members for their dedicated
effort to review papers in their area of expertise and on a timely manner. Their effort was
valuable to accommodate high quality papers in the CISWeb 2008 program.</p>
      <p>The research work presented at CISWeb 2008 was very interesting and exciting and the
Workshop involved live discussions and fruitful comments. Moreover, the program included a
very interesting invited talk by Prof. Bettina Hoser, from the Universität Karlsruhe, who
presented “Information Retrieval versus Knowledge Retrieval: A social network pespective”, a
topic which is emerging and of wide interest. We are grateful to Prof. Bettina Hoser for her
insightful presentation.</p>
      <p>Special thanks are ought to Eirini Giannakidou, PhD graduate student from the CERTH Research
Institute, for her technical support to CISWeb 2008 organization. The workshop has been held in
cooperation with the European Commission and WeKnowIt Integrated Project, and we are
indebted for their contributions and financial support.</p>
      <p>CISWeb 2008 Co-Chairs</p>
      <p>Conference Organization</p>
    </sec>
    <sec id="sec-2">
      <title>Programme Chairs</title>
      <sec id="sec-2-1">
        <title>Yannis Avrithis Ioannis Kompatsiaris Ste®en Staab Athena Vakali</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Programme Committee</title>
      <sec id="sec-3-1">
        <title>Harith Alani</title>
        <p>Andrea Baldassarri
Nick Bassiliades
Susanne Boll
Ciro Cattuto
Thierry Declerck
Ying Ding
William Grosky
Harry Halpin
Andreas Hotho
Paul Lewis
Jose Martinez
Phivos Mylonas
Lyndon Nixon
Noel O'Connor
Raphael Troncy</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>External Reviewers</title>
      <sec id="sec-4-1">
        <title>Eirini Giannakidou Gianluca Correndo Ioannis Katakis Georgios Meditskos</title>
        <p>Author Index
Alani, Harith . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Aleksovski, Zharko . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Angeletou, So¯a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
ten Kate, Warner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Tojo, JoA~ o . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
van Harmelen, Frank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Semantically enhanced webspace for scienti¯c collaboration : : : : : : : : : : : : : 109</p>
        <p>Daniel Harezlak, Piotr Nowakowski, Marian Bubak</p>
        <p>Bettina Hoser</p>
        <p>Information Services and Electronic Markets,
Institute of Information Engineering and Management,
Department of Economics and Business Engineering,</p>
        <p>UniversitaÄt Karlsruhe (TH)</p>
        <p>Germany
hoser@iism.uni-karlsruhe.de
1</p>
        <sec id="sec-4-1-1">
          <title>Introduction</title>
          <p>When is a trend a trend? When 'the right people' initialize it. This is very well
known from the world of fashion. In the world of news, research and technology
this may translate to the fact that a trend is a trend when 'relevant' people
or websites take up the topic. But how can the relevant people or websites be
distinguished from the less relevant? How can 'relevant' be de¯ned? How can
one detect really 'relevant' trends? 'Relevant' is always a re°ective approach. It
is dependent on the circumstances. Thus it is, e.g. in the case of fashion or news,
a social context.</p>
          <p>As an example for the question discussed here take a high tech company (e.g.
mobile phones) or a reinsurance company. For both it is essential that they see
trends before the competitors or the possible clients see it. In the case of the high
tech company, for example, it is crucial to know what the potential customers
are interested in, or which features in the current product are not accepted and
why. For the reinsurance company, it is necessary to know which hazards, e.g.
in health care, are being discussed, so that the company may prepare its policy
accordingly. As an illustration of that point take the discussion on obeisity in
children and subsequent health problems in adults.
2</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Information retrieval</title>
          <p>As companies look for ways to ¯nd trends as shown above they used to look for
example at newspapers. Nowadays the internet with its chat rooms, newsgroups,
social networking sites and blogs o®ers a wide area of information, which had not
been accessible before. To gather this information various methods have been
devised.</p>
          <p>Text analysis is one of the methods often used to extract information from
a text source. There is a large body of research literature, see e.g. [FNR03],
in the ¯elds of linguistics, information science or classi¯cation on diverse ways
to extract keywords, key phrases, etc. from websites and other text sources. In
these research ¯elds models have been built to explain how the context sensitive
relevance of words, phrases etc. can be de¯ned. Just think about classi¯cations
like e.g. the ACM Classi¯cation System. Some of these methods lead to lists of
possible topics listed by relative relevance according to their usage of phrases in
text.</p>
          <p>Another approach is to use additional information like keyword or tags to
enhance the information retrieved by classifying it. This has grown into the
research ¯elds on folksonomies, tagging, semantic web, etc.</p>
          <p>At this point though what is known is that these phrases or words are often
used. What is not known is who used them. Or to put it precisely, whether the
user is a 'relevant' user in the context. This is a question that has been at the
heart of the research ¯eld of Social Network Analysis.
3</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>Social Network Analysis</title>
          <p>Social Network Analysis (SNA) is a research area that tries to analyze and model
actor behavior based on his or her connections or relations to other members of
a group. Fur further reference see [WF99]. An actor is thus seen as restricted
or empowered by his or her connection to others. The basis of this structural
approach is given by models about group interaction. The ¯rst research questions
were posed to de¯ne roles to actors given a social context. Thus e.g. leadership
of a group is such a role. There are also models about the power to manipulate.
Thus a person in such a context may be called relevant, or central, if he or she is
positioned in such a way in the group's network that all information exchanged
between any two actors has to pass through this 'central' actor. He or she can
thus manipulate the group.</p>
          <p>Thus the question of who is relevant within a group is one of the research
questions with SNA. Based on graph theory this can be analyzed by using
different so called centrality indices. Some of them are inuitive, like e.g. degree
centrality, other are more elaborate like e.g. betweenness centrality or
eigenvector centrality. But always the question is: given a clearly de¯ned context, who
within a group is relevant, who is not, how are the actors in the group connected
and what, if any, predicitions can be made for the future deveploment of the
group structure.</p>
          <p>Thus, this analysis approach can be used to ¯nd the 'relevant' people or
websites needed to enhance the information found by text retrieval.
4</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>Knowledge Retrieval</title>
          <p>The idea to retrieve knowledge means not only to gather the information
available but to enrich it with other information to gain knowledge about a topic.
In the case proposed here this means to use results from SNA to enrich the
information gathered by text analysis to ¯nd whether the topics found by
information retrieval are 'really hot topics' because 'relevant people' talk about it, or
whether it is just 'small talk' by 'bystanders'. In a conceptual study [HSGS+07]
we used such an approach to look for socially enriched information about mobile
phones within a newsgroup.</p>
          <p>The idea proposed here is based on following information fusion approach:
First a text corpus and a group are de¯ned. Then the text corpus is analyzed and
the group structure is evaluated. As a last step these two results are combined to
gain knowledge. This is just a very crude and short description of the procedure.
One major challenge here is to to de¯ne the group. Depending on the area of
interest this can be a very large group or a collection of websites corresponding to
a group. Sometimes this may not even be a well de¯ned group. Thus biases can
be introduced by choice of actors (or websites). But once the group is de¯ned,
there is also the question of the appropriate text analysis method. Questions like
scalability and validity have to be answered here. As a last step, the
interpretation of the combined results have to be validated before any measures should be
taken.</p>
          <p>But even with regard to the aforementioned challenges this approach seems to
yield deeper insights into topics and trends, since it includes the social component
of trends.
5</p>
        </sec>
        <sec id="sec-4-1-5">
          <title>Outlook</title>
          <p>The potential of such an approach is very high. Not only are companies interested
in such a kind of knowledge gained from di®erent 'news'-sources, weighted by
the social impact, but also the average internet user. If one takes a look at
communites of diverse interests such as travel or such necesseties as emergencies,
it is not only valuable to have information at hand gathered from collective sites,
but also to know who gave the information and whether the source can be viewed
as 'relevant' in the given context. In the context of emergencies, this may save
lives.
Enriching Ontological User Profiles with Tagging History
for Multi-Domain Recommendations</p>
          <p>Iván Cantador1, Martin Szomszor2, Harith Alani2,</p>
          <p>Miriam Fernández1, Pablo Castells1</p>
          <p>1 Escuela Politécnica Superior
Universidad Autónoma de Madrid</p>
          <p>28049 Madrid, Spain
{ivan.cantador, miriam.fernandez, pablo.castells}@uam.es
2 School of Electronics and Computer Science</p>
          <p>University of Southampton
SO17 1BJ Southampton, United Kingdom</p>
          <p>{mns2, ha}@ ecs.soton.ac.uk
Abstract. Many advanced recommendation frameworks employ ontologies of
various complexities to model individuals and items, providing a mechanism
for the expression of user interests and the representation of item attributes. As
a result, complex matching techniques can be applied to support individuals in
the discovery of items according to explicit and implicit user preferences.
Recently, the rapid adoption of Web2.0, and the proliferation of social
networking sites, has resulted in more and more users providing an increasing
amount of information about themselves that could be exploited for
recommendation purposes. However, the unification of personal information
with ontologies using the contemporary knowledge representation methods
often associated with Web2.0 applications, such as community tagging, is a
non-trivial task. In this paper, we propose a method for the unification of tags
with ontologies by grounding tags to a shared representation in the form of
Wordnet and Wikipedia. We incorporate individuals’ tagging history into their
ontological profiles by matching tags with ontology concepts. This approach is
preliminary evaluated by extending an existing news recommendation system
with user tagging histories harvested from popular social networking sites.
1 Introduction
The increasing proliferation of Web2.0 style sharing platforms, coupled with the rapid
development of novel ways to exploit them, is paving the way for new paradigms in
Web usage. Virtual communities and on-line services such as social networking,
folksonomies, blogs, and wikis, are fostering an increase in user participation,
engaging users and encouraging them to share more and more information, resources,
and opinions. The huge amount of information resulting from this emerging
phenomenon gives rise to excellent opportunities to investigate, understand, and
exploit the knowledge about the users’ interests, preferences and needs. However, the
current infrastructure of the Web does not provide the mechanisms necessary to
consolidate this wealth of personal data since they are spread over many unconnected,
heterogeneous sources.</p>
          <p>Community tagging sites, and their respective folksonomies, are a clear example of
this situation: users have access to a plethora of web sites that allow them to annotate
and share many types of resources. For example, they can organise and make photos
available on Flickr1, classify and share bookmarks using del.icio.us2, communicate and
share resources with friends using Facebook3. Through personal tags, users implicitly
declare different facets of their personalities, such as their favourite book subjects on
LibraryThing4, movie preferences on IMDb5, music tastes on Last.fm6, and so forth.
Therefore, the domains covered by social tagging applications are both disparate and
divergent, creating considerably complex and extensive descriptions of user profiles.</p>
          <p>In the current Web2.0 landscape, there is a distinct lack of tools to support users with
meaningful ways to query and retrieve resources spread over disparate end-points: users
should be able to search consistently across a broad range of sites for diverse media
types such as articles, reviews, videos, and photos. Furthermore, such sites could be
used to support the recommendation of new resources belonging to multiple domains
based on tags from different sites. As a step towards making this vision a reality, we
explore the use of syntactic and semantic based technologies for the combination,
communication and exploitation of information from different social systems.</p>
          <p>
            In this paper, we present an approach for the consolidation of social tagging
information from multiple sources into ontologies that describe the domains of
interest covered by the tags. Ontology-based user profiles enable rich comparisons of
user interests against semantic annotations of resources, in order to make personal
recommendations. This principle has already been tested by the authors in different
personalised information retrieval frameworks, such as semantic query-based
searching [
            <xref ref-type="bibr" rid="ref22 ref4">4</xref>
            ], personalised context-aware content retrieval [
            <xref ref-type="bibr" rid="ref13 ref31">13</xref>
            ], group-oriented
profiling [
            <xref ref-type="bibr" rid="ref21 ref3">3</xref>
            ], and multi-facet hybrid recommendations [
            <xref ref-type="bibr" rid="ref2 ref20">2</xref>
            ].
          </p>
          <p>We propose to feed the previous strategies with user profiles built from personal
tag clouds obtained from Flickr and del.icio.us web sites. The mapping of those social
tags to our ontological structures involve three steps: the filtering of tags, the
acquisition of semantic information from the Web to map the remaining tags into a
common vocabulary, and the categorisation of the obtained concepts according to the
existing ontology classes.</p>
          <p>
            An application of the above techniques has been tested in News@hand, a news
recommender system which integrates our different ontology-based recommendation
approaches. In this system, ontological knowledge bases and user profiles are
generated from public social tagging information, using the aforementioned
techniques. The News@hand system, along with the automatic acquisition of news
articles from the Web, and the automatic semantic annotation of these items using
Natural Language Processing tools [
            <xref ref-type="bibr" rid="ref1 ref19">1</xref>
            ] and the Lucene7 indexer shall also be described.
1 Flickr, Photo Sharing, http://www.flickr.com/
2 del.icio.us, Social Bookmark manager, http://del.icio.us/
3 Facebook, Social Networking, http://www.facebook.com/
4 LibraryThing, Personal Online Book Catalogues, http://www.librarything.com/
5 IMDb, Internet Movie Database, http://imdb.com/
6 Last.fm, The Social Music Revolution, http://www.last.fm/
7 Lucene, An Open Source Information Retrieval Library, http://lucene.apache.org/
          </p>
          <p>
            The structure of the paper is the following. Section 2 briefly describes our
approach for representing user preferences and item features using ontology-based
knowledge structures, and how they are exploited by several recommendation models.
Section 3 explains mechanisms to automatically relate and transform social tagging
and external semantic information into our ontological knowledge structures. A real
implementation and evaluation of the previous tag transformation and
recommendation processes within a news recommender system are presented in
section 4. Finally, section 5 proclaims some conclusions and future research lines.
2 Hybrid recommendations
In this section, we summarise the ontology-based knowledge representation and
recommendation models in which filtered social tags are proposed to be integrated
and exploited.
2.1 Ontology-based representation of item features and user preferences
In the knowledge representation we propose [
            <xref ref-type="bibr" rid="ref13 ref22 ref31 ref4">4, 13</xref>
            ], user preferences are described as
vectors um = (um,1, um,2 ,..., um,K ) where um,k ∈ [
            <xref ref-type="bibr" rid="ref1 ref19">0,1</xref>
            ] measures the intensity of the
interest of user um ∈ U for concept ck ∈O (a class or an instance) in a domain
ontology O , K being the total number of concepts in the ontology. Similarly, items
dn ∈ D are assumed to be annotated by vectors dn = (dn,1, dn,2 ,..., dn,K ) of concept
weights, in the same vector-space as user preferences.
          </p>
          <p>
            The main advantages of this knowledge representation are its portability, thanks to
the XML-based Semantic Web standards, the domain independency of the subsequent
content retrieval and recommendation algorithms, and the multi-source nature of the
proposal (different types of media could be annotated: texts, images, videos).
2.2 Personalised content retrieval
Our notion of content retrieval is based on a matching algorithm that provides a
personal relevance measure pref ( dn , um ) of an item dn for a user um . This measure
is set according to semantic preferences of the user and semantic annotations of the
item, and is based on a cosine vector similarity cos (dn , um ) . The obtained similarity
values (Personalised Ranking module of Figure 1) can be combined with query-based
scores without personalisation sim ( dn , q ) and semantic context information (Item
Retrieving module of Figure 1), to produce combined rankings [
            <xref ref-type="bibr" rid="ref13 ref31">13</xref>
            ].
          </p>
          <p>
            To overcome the existence of sparsity in user profiles, we propose a preference
spreading mechanism, which expands the initial set of preferences stored in user profiles
through explicit semantic relations with other concepts in the ontology. Our approach is
based on Constrained Spreading Activation (CSA), and is self-controlled by applying a
decay factor to the intensity of preference each time a relation is traversed. We have
empirically demonstrated [
            <xref ref-type="bibr" rid="ref13 ref21 ref3 ref31">3, 13</xref>
            ] that preference extension improves retrieval precision
and recall. It also helps to mitigate other well-known limitations of recommender
systems such as the cold-start, overspecialisation and portfolio effects.
2.3 Context-aware recommendations
The context is represented in our approach [
            <xref ref-type="bibr" rid="ref13 ref31">13</xref>
            ] as a set of weighted ontology concepts.
This set is obtained by collecting the concepts that have been involved in the interaction
of the user (e.g. accessed items) during a session. It is built in such a way that the
importance of concepts fades away with time by a decay factor. Once the context is
built, a contextual activation of user preferences is achieved by finding semantic paths
linking preferences to context. These paths are made of existing relations between
concepts in the ontologies, following the spreading technique mentioned in section 2.2.
2.4 Group-oriented recommendations
The presented user profile representation allows us to easily model groups of users. We
have explored the combination of the ontology-based profiles to meet this purpose [
            <xref ref-type="bibr" rid="ref21 ref3">3</xref>
            ],
on a per concept basis, following different strategies from social choice theory. In our
approach, user profiles are merged to form a shared group profile, so that common
content recommendations are generated according to this new profile.
2.5 Multi-facet hybrid recommendations
In order to make hybrid recommendations we cluster the semantic space based on the
correlation of concepts appearing in the profiles of individual users. The obtained
clusters Cq represent groups of preferences (topics of interests) shared by a significant
number of users. Using these clusters profiles are partitioned into semantic segments.
Each of these segments corresponds to a cluster and represents a subset of the user
interests that is shared by the users who contributed to the clustering process. By thus
introducing further structure in user profiles, we define relations among users at
different levels, obtaining multilayered communities of interest.
          </p>
          <p>
            Exploiting the relations of the communities which emerge from the users’ interests,
and combining them with item semantic information, we have presented in [
            <xref ref-type="bibr" rid="ref2 ref20">2</xref>
            ] several
recommendation models that compare the current user interests with those of the others
users in a double way. First, according to item characteristics, and second, according to
connections among user interests, in both cases at different semantic layers.
pref ( dn , um ) = ∑ nsim ( dn , Cq ) ∑ nsimq (um , ui )·simq (dn , ui )
q i
3 Relating social tags to ontological information
Parallel to the proliferation and growth of social tagging systems, the research
community is increasing its efforts to analyse the complex dynamics underlying
folksonomies, and investigate the exploitation of this phenomenon in multiple
domains. Results reported in [
            <xref ref-type="bibr" rid="ref23 ref5">5</xref>
            ] suggest that users of social systems share behaviours
which appear to follow simple tagging activity patterns. Understanding, predicting
and controlling the semiotic dynamics of online social systems are the base pillars for
a wide variety of applications.
          </p>
          <p>
            For these purposes, the establishment of a common vocabulary (set of tags) shared
by users in different social systems is a desirable situation. Indeed, recent works have
focused on the improvement of tagging functionalities to generate tag datasets in a
controlled, coordinated way. P-TAG [
            <xref ref-type="bibr" rid="ref24 ref6">6</xref>
            ] is a method that automatically generates
personalised tags for web pages, producing keywords relevant both to their textual
content and to data collected from the user’s browsing. In [
            <xref ref-type="bibr" rid="ref26 ref8">8</xref>
            ], an adaptation of
userbased collaborative filtering and a graph-based recommender is presented as a tag
recommendation mechanism that eases the process of finding good tags for a
resource, and consolidating the creation of a consistent tag vocabulary across users.
          </p>
          <p>
            The integration of folksonomies and the Semantic Web has been envisioned as an
alternative approach to the collaborative organisation of shared tagging information.
The proposal presented in [
            <xref ref-type="bibr" rid="ref11 ref29">11</xref>
            ] uses a combination of pre-processing strategies and
statistical techniques together with knowledge provided by ontologies for making
explicit the semantics behind the tag space in social tagging systems.
          </p>
          <p>In the work presented herein, we propose the use of knowledge structures defined
by multiple domain ontologies as a common semantic layer to unify and classify
social tags from several Web 2.0 sites. More specifically, we propose a mechanism
for the creation of ontology instances for the gathered tags, according to semantic
information collected from the Web. Tagging information is linked to ontological
structures by our method through a sequence comprising three processing steps:
•</p>
          <p>
            Filtering social tags: To facilitate the integration of information from different
social sources as well as the subsequent translation of that information into
ontological knowledge, a pre-processing of the tags is needed, associating them
to a common vocabulary, shared by the different involved applications.
Morphologic and semantic transformations of tags are performed at this stage
based on the WordNet English dictionary [
            <xref ref-type="bibr" rid="ref27 ref9">9</xref>
            ], the Wikipedia8 encyclopaedia and
the Google9 web search engine.
•
          </p>
          <p>Obtaining semantic information about social tags: The shared vocabulary is
created with the use of Wikipedia, which provides semantic information about
millions of concepts.
•</p>
          <p>Categorisation of social tags into ontology classes: Once the tags have been
filtered and mapped to a shared vocabulary, they are automatically converted
into instances of classes of domain ontologies. Again, semantic categorisation
information available in Wikipedia is exploited in this process.</p>
          <p>These steps are explained in more detail in the next subsections.
8 Wikipedia, The Free Encyclopaedia, http://en.wikipedia.org/
9 Google, Web Search Engine, http://www.google.com/
3.1 Filtering social tags
Raw tagging information can be noisy and inconsistent. When manual tags are
introduced with a non-controlled tagging mechanism, people often make grammatical
mistakes (e.g. barclona instead of barcelona), tag concepts indistinctly in singular,
plural or derived forms (blog, blogs, blogging), sometimes add adjectives, adverbs,
prepositions or pronouns to the main concept of the tag (beautiful car, to read), or use
synonyms and acronyms that could be converted into a single tag (biscuit and cookie,
ny and new york). Moreover, the tag encoding and storage mechanisms used by social
systems often alter the tags introduced by the users: they may transform white spaces
(san francisco, san-francisco, san_francisco, sanfrancisco) and special characters in
the tags (los angeles for los ángeles, zurich instead of zürich), etc.</p>
          <p>Thus, while it is possible to gather information from multiple folksonomy sites, such
as Flickr or del.icio.us, inconsistency will lead to confusion and loss of information
when tagging data is compared. For example, if a user has tagged photos from a recent
holiday in New York with nyc, but also bookmarked relevant pages in del.icio.us with
new_york, the correlation will be lost. In order to facilitate the folksonomy data analysis
and integration, tags have to be filtered and mapped to a shared vocabulary. Here, we
present a tag filtering architecture that makes use of external knowledge resources such
as the WordNet dictionary, Wikipedia encyclopaedia and Google web search engine.</p>
          <p>The filtering process is a sequential execution where the output from one filtering
step is used as input to the next. The output of the entire filtering process is a set of new
tags that correspond to an agreed representation. As will be explained below, this is
achieved by correlating tags to entries in two large knowledge resources: Wordnet and
Wikipedia. Wordnet is a lexical database and thesaurus that group English words into
sets of cognitive synonyms called synsets, providing definitions of terms, and modelling
various semantic relations between concepts: synonym, hypernym, hyponym, among
others. Wikipedia is a multilingual, open-access, free-content encyclopaedia on the
Internet. Using a wiki style of collaborative content writing, is has grown to become one
of the largest reference Web sites with over 75,000 active contributors, maintaining
approximately 9,000,000 articles in over 250 languages (as of February 2008).
Wikipedia contains collaboratively generated categories that classify and relate entries,
and also supports term disambiguation and dereferencing of acronyms.</p>
          <p>Figure 2 provides a visual representation of the filtering process where a set of raw
tags are transformed into a set of filtered tags and a set of discarded tags. Each of the
numbers in the diagram corresponds to a step outlined below.</p>
          <p>For this work, tags from public available user accounts from Flickr and del.icio.us
sites have been collected and filtered. A total of 1004 user profiles have been gathered
from these two systems, providing 149,529 and 84,851 distinct tags respectively.
Initially, the intersection between both datasets was 28,550 common tags.
Step 1: Lexical filtering
After raw tags have been harvested from different folksonomy sites, they are passed
to the Lexical Filter, which applies several filtering operations. Tags that are too small
(with length = 1) or too large (length &gt; 25) are removed, resulting in a discarding rate of
approximately 3% of the initial dataset. In addition, considering the discrepancies in the
use of special characters (such as accents, dieresis and caret symbol), we convert such
special characters to a base form (e.g., the characters à, á, â, ã, ä, å are converted to a).</p>
          <p>Tags containing numbers are also filtered based on a set of custom heuristics. For
example, to maintain salient numbers, such as dates (2006, 2007, etc), common
references (911, 360, 666, etc), or combinations of alphanumeric characters (7 up,
4 x 4, 35 mm), we discard unpopular tags below a certain global tag frequency
threshold. Finally, common stop-words, such as pronouns, articles, prepositions and
conjunctions are removed. After lexical filtering, tags are passed on to the Wordnet
Manager. If a tag has an exact match in Wordnet, we pass it on directly to the set of
filtered tags, to save further unnecessary processing.</p>
          <p>
            Step 2: Compound nouns and misspellings
If a tag is not found in Wordnet, we consider possible misspellings and compound
nouns. Motivated by [
            <xref ref-type="bibr" rid="ref11 ref29">11</xref>
            ], to solve these problems, we make use of the Google “did you
mean” mechanism. When a search term is entered, the Google engine checks whether
more relevant search results are found with an alternative spelling. Because Google’s
spell check is based on occurrences of all words on the Internet, it is able to suggest
common spellings for proper nouns that would not appear in a standard dictionary.
          </p>
          <p>The Google “did you mean” mechanism also provides an excellent way to resolve
compound nouns. Since most tagging systems prevent users from entering white spaces
into the tag value, users create compound nouns by concatenating nouns together or
delimiting them with a non-alphanumeric character such as _ or -, which introduces an
obvious source of complication when aligning folksonomies. By sending compound
nouns to Google, we easily resolve the tag into its constituent parts. This mechanism
works well for compound nouns with two terms, but is likely to fail if more than two
terms are used. For example, the tag sanfrancisco is corrected to san francisco, but the
tag unitedkingdomsouthampton is not resolved by Google.</p>
          <p>We have thus developed a complementary algorithm that quickly and accurately
splits compound nouns of three or more terms. The main idea is to firstly sort the tags
in alphabetical order, and secondly process the generated tag list sequentially. By
caching previous lookups, and matching the first shared characters of the current tag
string, we are able to split it into a prefix (previously resolved by Google) and a
postfix. A second lookup is then made using the postfix to seek further possible
matches. The process is iteratively repeated until no splits are obtained from our
Google Connector. Compared to a bespoke string-splitting heuristic, this process has a
very low computational cost. This mechanism successfully recognizes long compound
nouns such as war of the worlds, lord of the rings, and martin luther king jr.</p>
          <p>Similarly to Step 1, after using Google to check for misspellings and compound
nouns, the results are validated against the Wordnet Manager. Unprocessed tags are
added to the pending tag stack, and unmatched tags are discarded.</p>
          <p>Step 3: Wikipedia correlation
Many of the popular tags occurring in community tagging systems do not appear in
grammar dictionaries, such as Wordnet, because they correspond to proper names
(such as famous people, places, or companies), contemporary terminology (such as
web2.0 and podcast), or are widely used acronyms (such as asap and diy).</p>
          <p>In order to provide an agreed representation for such tags, we correlate tags to their
appropriate Wikipedia entries. For example, when searching the tag nyc in Wikipedia,
the entry for New York City is returned. The advantage of using Wikipedia to agree on
tags from folksonomies is that Wikipedia is a community-driven knowledge base, much
like folksonomies are, so that it rapidly adapts to accommodate new terminology.</p>
          <p>Apart from consolidating agreed terms for the filtered tags, our Wikipedia
Connector retrieves semantic information about each obtained entry. Specifically, it
extracts ambiguous concepts (e.g., “java programming language” and “java island”
for the entry “java”), and collaboratively generated categories (e.g., “living people”,
“film actors” and “american male models” for the entry “brad pitt”). This information
is exploited by the ontology population and annotation processes described below.
Step 4: Morphologically similar terms
An additional issue to be considered during the filtering process is that users often use
morphologically similar terms to refer to the same concept. One very common example
of this is the no discrepancy between singular and plural terms, such as blog and blogs,
and other morphological deviations (e.g. blogging). In this step, using a custom
singularisation algorithm, and the stemming functions provided by the Snowball
library10, we reduce morphologically similar tags to a single tag. For each group of
similar tags, the shortest term found in Wordnet is used as the representative tag.
Step 5: WordNet synonyms
When people communicate a certain concept, they often use synonyms, i.e., terms that
have the same meaning, but with different morphological forms. A natural filtering step
is the simplification of the tag sets by merging pairs of synonyms into single terms.</p>
          <p>WordNet provides synonym relations between synsets of the terms. However, due
to ambiguous meanings of the tags, not all of them can be taken into consideration,
and the filtering process must be very carefully executed. Our merging process
comprises three stages. In the first stage, a matrix of synonym relations is created by
using Wordnet. In the second stage, according to the number of synonym relations
found for each tag, we identify the non-ambiguous synonym pairs, and finally, stage
three replaces each of the synonym pairs by the term that is most popular. Examples
of thus processed synonym pairs are android and humanoid, thesis and dissertation,
funicular and cable railway, stein and beer mug, or poinsettia and christmas flower.
10 Snowball, String-handling Language, http://snowball.tartarus.org/
3.2</p>
          <p>Obtaining semantic information about social tags
In order to populate ontologies with concepts associated to the filtered social tags,
general multi-domain semantic knowledge is needed. In this work, as mentioned
before, we propose to extract that information from Wikipedia. The Wikipedia articles
describe a number of different types of entities: people, places, companies, etc.,
providing descriptions, references, and even images about the described entities.</p>
          <p>Many of these entities are ambiguous, having several meanings for different
contexts. For instance, the same tag “java” could be assigned to a Flickr picture of the
Pacific island, or a del.icio.us page about the programming language. One approach to
address tag disambiguation is by using the information available in Wikipedia. A
Wikipedia article is fairly structured: the title of the page is the entity name itself (as
found in Wikipedia), the content is divided into well delimited sections, and a first
paragraph is dedicated to possible disambiguation options for the corresponding term.
For example, the page of the entry “apple” starts as follows:
•
•
•
“This article is about the fruit…”
“For the Beatles multimedia corporation, see…”
“For the technology company, see…”</p>
          <p>Apart from these elements, every article contains a set of collaboratively generated
categories. Hence, for example, the categories created for the concept “teide” are:
world heritage sites in spain, tenerife, mountains of spain, volcanoes of spain, national
parks of spain, stratovolcanoes, hotspot volcanoes, and decade volcanoes. Processing
somehow the previous information, we might infer that “teide” is a volcano in Spain.</p>
          <p>Disambiguation and categorisation information have been therefore extracted from
Wikipedia for every concept appearing in our social tag datasets. Once the most
suitable category for a term is determined, we match its relevant categories to classes
defined in the domain ontologies, as explained next.
3.3</p>
          <p>Categorisation of social tags into ontology classes
The assignment of an ontology class to a Wikipedia entry is based on a morphologic
matching between the name and the categories of the entry, and the names of the
ontology classes. The ontology classes with most similar names to the name and
categories of the entry are chosen as the classes whereof the corresponding individual
(instance) is to be created. The created instances are assigned a URI containing the
entry name, and are given RDFS labels with the Wikipedia categories.</p>
          <p>To better explain the proposed matching method, let us consider the following
example. Let “brad pitt” be the concept we wish to instantiate. If we look up this
concept in Wikipedia, a page with information about the actor is returned. At the end of
the page, several categories are shown: “action film actors”, “american film actors”,
“american television actors”, “best supporting actor golden globe (film)”, “living
people”, “missouri actors”, “oklahoma (state) actors”, “american male models”, etc.</p>
          <p>After retrieving that information, all the terms (tokens) that appear in the name and
categories of the entry (which we will henceforth refer to as entry terms) are
morphologically compared with the names of the ontology classes (assuming that a
classlabel mapping is available, as it is usually the case). Computing the Levenshtein distance,
and applying singularisation and stemming mechanisms, only the entry terms that match
some class name, above a certain distance threshold, are kept, and the rest are discarded.
For instance, suppose that “action”, “actor”, “film”, “people”, and “television” are the
ones sufficiently close to some ontology class. To select the most appropriate ontology
class among the matching ones, we firstly create a vector whose coordinates correspond
to the filtered entry terms, taking as value the number of times the term appears in the
entry name and categories together. In the example, the vector might be as follows:
{(action, 1), (actor, 6), (film, 3), (people, 1), (television, 1)}, assuming that “actor”
appears in six categories of the Wikipedia entry “brad pitt”, and so forth.</p>
          <p>Once this vector has been created, one or more ontology classes are selected by the
following heuristic:
1. If a single coordinate holds the maximum value in the vector, we select the
ontology class that matches the corresponding term.
2. In case of a tie between several coordinates having the maximum value, a new
vector is created, containing the matched classes plus their taxonomic ancestor
classes in the ontologies. Then the weight of each component is computed as the
number of times the corresponding class is found in this step. Finally, the original
classes that have the highliest valued ancestor in the new vector are selected.</p>
          <p>Here “ontology class” and “ancestor” denote a loose notion admitting a broad
range of taxonomic constructs, ranging from informally built subject hierarchies (such
as the ones defined in the Open Directory tree or, in our experiments, the IPTC
Subjects), to pure ontology classes in a strict Description Logic sense.</p>
          <p>In our example, the weight for the term “actor” is the highest, so we select its
matching class as the category of the entry. Thus, assuming that the class matching
this term was “Actor”, we finally define “Brad Pitt” as an instance of “Actor”.</p>
          <p>Now suppose that, instead, the vector for Brad Pitt was {(actor, 1), (film, 1), (people,
1)}. In that case, there would be a tie in the matching classes, and we would apply the
second case of the heuristic. We take the ancestor classes, which could be e.g. “cinema
industry” for “actor”, “cinema industry” for “film”, and “mammal” for “person”, and
create a weighted list with the original and ancestor classes. Then we count the number
of times each class appears in the previous list, and create the new vector: {(actor, 1),
(film, 1), (person, 1), (cinema industry, 2), (mammal, 1)}. Since the class “cinema
industry” has the highest weight, we finally select its sub-classes “actor” and “film” as
the classes of the instance “brad pitt”.</p>
          <p>
            We must note that our ontology population mechanism does not necessarily
generate individuals following a strict semantic “is-a” schema, but a more relaxed
semantic “is-related-to” association principle. This is not a problem for our final
purposes in personalised content retrieval, since the annotation and recommendation
methods in that area are themselves rooted on models of inherently approximated
nature, e.g. regarding the relationships between concepts and item contents.
4 Preliminary evaluations
Recent works show an increasing interest in using social tagging information to enhance
personalised content retrieval and recommendation. FolkRank [
            <xref ref-type="bibr" rid="ref25 ref7">7</xref>
            ] is a search algorithm
that exploits the structure of folksonomies to find communities and organise search
results. The recommender system presented in [
            <xref ref-type="bibr" rid="ref10 ref28">10</xref>
            ] suggests web pages available on the
Internet, by using folksonomy and social bookmarking information. The movie
recommender proposed in [
            <xref ref-type="bibr" rid="ref12 ref30">12</xref>
            ] is built on keywords assigned to movies via collaborative
tagging, and demonstrates the feasibility of making accurate recommendations based on
the similarity of item keywords to those of the user’s rating tag-clouds.
          </p>
          <p>In the following, we present and preliminary evaluate how our ontological
knowledge representation, recommendation models, and tag filtering and matching
strategies are integrated in News@hand, a news recommender system.
4.1 News@hand
News@hand is a news recommender system that describes news contents and user
preferences with a controlled and structured vocabulary, using semantic-based
technologies, and integrating the recommendation models described in section 2.
Figure 3 depicts how ontology-based item descriptions and user profiles are created and
exploited by the system.</p>
          <p>News items are automatically and periodically retrieved from several on-line news
services via RSS feeds. The title and summary of the retrieved news are annotated with
concepts of the domain ontologies. A dynamic graphic interface allows the system to
automatically retrieve all the users’ inputs in order to analyse their behaviour with the
system, update their preferences, and adjust the recommendations in real time.</p>
          <p>Figure 4 shows a screenshot of a typical news recommendation page in News@hand.
The news items are classified into eight different sections: headlines, world, business,
technology, science, health, sports and entertainment. When the user is not logged in the
system, s/he can browse any of the previous sections, but the items are listed without
any personalised criterion. On the other hand, when the user is logged in the system,
recommendation and profile edition functionalities are activated, and the user can
browse the news according to his and others’ preferences in different ways. Click
history is used to detect the short term user interests, which represent the dynamic
semantic context exploited by our personalised content retrieval mechanism.
– It could be union of and (and more): From the other perspective (see
also previous template), this template provides different types of the item.</p>
          <p>The player can extend this template by adding more items.
– It is complement of the : This template provides complement
objects/concepts of the item.
– It is disjoint with (opposite of) : This template provides the
objects/concepts that are disjoint with the item.
– It is equivalent to the : This template provides equivalent objects/concepts
to the item.</p>
          <p>Note that in this phase we have also the notion of prohibited statements.
Prohibited statements are actually those statements that most players decide to
choose first. we are not interested to collect these statements all time, so we do
not give the opportunity to the player (narrator) to use them.</p>
          <p>However, the players should use these templates, as we build OWL ontologies
by aid of these templates, but we give also the option to the narrator to build
arbitrary sentences as well, if the templates can not be useful. These arbitrary
sentences will build comments for the generated ontology.
3</p>
          <p>Generating OWL-based Ontology
In this section, we introduce the translation mechanism that we use to generate
OWL-based ontologies. The Ontology will be created for the object that the
players are playing, e.g. book, computer, car. After every play using an object
(item), we collect some common sense facts about that item and we can build
an ontology for that. The first iteration of generating ontologies is draft and can
not be considered as a complete ontology. In other words, the ontology is created
during several iterations and not at the first time.
For the approved properties, i.e. the properties that their frequencies are more
than a threshold, a owl:class is generated. These classes are actually the
transformation of properties into OWL representation using a mediator/mapper which
is simply able to generate classes and their properties. Suppose a domain like a
book : For every approved property or concept, a class and a link will be generated
to associate this class to main concept which in our example is a book. Figure 1
illustrates the mapping between some selected properties and their OWL
representations. As we mentioned earlier, the properties will be stored in a knowledge
base (KB) and as soon as they are mature enough to be linked, the mapper will
translate them into OWL and link them to the main concept. In the following
sections, we provide a more detailed description.</p>
          <p>Fig. 1. Generating OWL for Properties Using a Mapper/Mediator
Pre-Refinement of Concepts (Refining Before Mediation). As we
mentioned earlier, the concepts need to be refined. The refinement process is as
follows: Because a specific object can be played more than once, we assign a
counter to every object and the counter increases if the players are playing that
object. We call this counter objectCounter in which the word object will be
replaced with the explicit name of the object. A counter is also assigned to every
property that the players agree upon that during the game and after further
agreement by other players, the counter increases. We call this counter
objectPropertyCounter which object will be replaced with the explicit name of the
object and property will be replaced with the explicit name of the property of
the object. The variance is defined for each property and is calculated by
objectCounter minus objectPropertyCounter. If the result is greater than threshold1,
the property will be moved to prohibited list, as many pairs agreed upon that
property and if it is less than threshold2, the property will be deleted, as only very
few pairs agreed upon that property. Note that, we do not care about uppercase
and lowercase of alphabetic letters. Listing 1.1 demonstrates the pseudocode of
this refinement.</p>
          <p>Listing 1.1. Pseudocode of Refining Concepts
1 i f ( o b je c t i s s e l e c t e d ) then
32456 i f ( objecootbbPjjeerccottpPCeroroutpynetreitrsy+Cs+oe;ul enctteerd+)+;then
7 v ar i anc e ( obj ect Pr oper ty ) = objectCounter − objectPropertyCounter ;
8
9 i f ( norm al i ze ( v ar ia nce ( obje ctP rope rt y ) ) &gt; t hr es h ol d1 ) then
1101 move obje ctP rope rt y to p r o hi bi t ed l i s t ;
12 i f ( norm al i ze ( v ar ia nce ( obje ctP rope rt y ) ) &lt; t hr es h ol d2 ) then
13 d e l e t e objec tPr ope rt y ;
Concept Mediator/Mapper. Concept mediator/mapper is simply a mapper
that gets the property or concept as input and generates OWL statements as
output. The OWL statement contains also the link that associates the property
to the main object. Figure 1 demonstrates some sample inputs and outputs of
the mediator/mapper. However, in this step, we do not have our ontology and
we have just gathered only properties and built their links. The ontology will be
created after gathering sufficient facts about the object.</p>
          <p>Post-Refining (Refinement After Mediation). After generating OWL
representations of properties, they need also to be purified. Refining statements is
an iterative task and tries to build a summarized version of statements based on
resource URIs. Figure 2 demonstrates a sample of this post-refinement.
In the previous section, we presented the fixed templates that we use to gather
common sense facts about objects. As we mentioned, those templates were
carefully chosen for two main purposes: First, to be able to be translated into OWL
using a mediator/mapper and second, to avoid the game being boring, as we need
to entertain players, instead of assigning tasks to them. Table 1 demonstrates the
general translation of templates. Note that &amp;xsd; refers to XSD namespace which
is actually xmlns:xsd = ”http://www.w3.org/2001/XMLSchema#”. To avoid a
huge messy table, we decided to use acronyms.</p>
          <p>Template
X has at least Y
X has at most</p>
          <p>Y
X is kind of
X could be either
(or more)</p>
          <p>or
X could be union of
(and more)</p>
          <p>and
X is complement of</p>
          <p>Continued on next page
X is disjoint with (opposite of)
X is equivalent to
&lt;/owl:Class&gt;
&lt;owl:Class rdf:ID = ”some concept”/&gt;
&lt;owl:Class rdf:ID = ”X”&gt;
&lt;owl:disjointWith&gt;
&lt;owl:Class rdf:about = ”#some concept”/&gt;
&lt;/owl:disjointWith&gt;
&lt;/owl:Class&gt;
&lt;owl:Class rdf:ID = ”some concept”/&gt;
&lt;owl:Class rdf:ID = ”X”&gt;
&lt;owl:equivalentClass&gt;
&lt;owl:Class rdf:about = ”#some concept”/&gt;
&lt;/owl:equivalentClass&gt;
&lt;/owl:Class&gt;
Pre-Refinement of Statements (Refinement Before Mediation). The
main goal of Pre-Refinement is to select the statements that can be translated
into correct OWLs. The process is as follows: Like previous refinement, we assign
a counter to an object. we call this counter objectCounter2. We assign also a
counter to every instance of a template related to object. We call this counter
objectTInstanceCounter. We log all instances that will be sent to guesser. If the
instance was helpful and the guesser could guess the word correctly, we increase
the objectTInstanceCounter, but if the instance was not useful and the guesser
was not able to guess the word, we decrease the objectTInstanceCounter. We
compare the objectTInstanceCounter with some thresholds and then we decide
whether to keep, delete or move it into the prohibited list. Note that in this
refinement, we do not care about uppercase and lowercase of alphabetic letters.
Listing 1.2 demonstrates the pseudocode of this refinement.</p>
          <p>Listing 1.2. Pseudocode of Refining Instances
1 i f ( o b je c t i s s e l e c t e d ) then
11165743892102 veiifflasre((ianonbocrjeemmc(oooatbobbolTbjjjvieeeezIjcccneetttc(soTTCttvbTaIIoajnnnueIrsscnncittatetsaaTnetnnracwIcc2nenaee+cs(sCCeto+oao)hb;uunje=nnceltecpteetofrTrutb−o+lIjen)−+csp;;tttrCahonoehnucienb)tite)er&gt;2d −tl hisortebs;jhecotlTd 3In)sttahnecneCounter ;
13 i f ( norm al i ze ( v ar ia nce ( objec tT Ins ta nce ) ) &lt; t hr es h ol d4 ) then
14 d e l e t e o bject T Ins tanc e ;</p>
          <p>Statement Mediator/Mapper. Statement mediator/mapper is simply a
mapper that gets the template instance as input and generates OWL statements as
output. The OWL statements also contain all necessary links to the main object.
Table 1 demonstrates the OWL translation of some fixed templates. Note that
the italic words are those variable words that are used by the narrator.
Post-Refinement (Refinement After Mediation) and Ontology
Assembler. After generating OWL representations, they need to be purified. Refining
statements is an iterative task that tries to build a summarized version of
statements based on resource URIs. Figure 3 demonstrates a sample of statement
refinement.</p>
          <p>As we mentioned earlier, the fixed templates are just highly-recommended
proposals to be used. If they are not helpful for the narrator to help the guesser,
he/she may simply use English sentences. As these sentences have no structure,
we keep them as comments for the ontology, if they were helpful for guesser.</p>
          <p>Fig. 3. Statement Refinement Sample</p>
          <p>After all these processes, the general assembler is able to merge these
statements and build the first version of the ontology. This is an iterative task and
the ontology will be completed after several plays. Every Ontology has a
version track using owl:versionInfo that enables us to keep the history of generated
ontologies. Figure 4 demonstrates the iterative life cycle of generating ontologies.
4
To evaluate the quality of the generated ontologies, we have checked how they
change with an increasing number of rounds. To make our presentation feasible,
we have reduced the number of rounds to ten and the number of concepts to
two (tree and book).</p>
          <p>In the first round (see Section 2.1), the properties color, height, and age were
collected for the word tree. After ten rounds, we additionally collected leaves and
species. The same test performed for the word book resulted in five properties:</p>
          <p>Fig. 4. Iterative Life Cycle of Generating Ontologies
author, language, publisher, title, and year of publishing. Five more rounds gave us
additionally three more properties: number of pages, language and index. Tables
2 and 3 present the results that we have collected; we show both the words
that affected the created ontology and the words that were rejected. However,
the rejected words can become properties of the ontology, if we perform more
rounds.</p>
          <p>By analyzing more and more examples, we noticed that the number of
properties does not grow linearly with the number of rounds. Additionally, some of
the players were using plural versions of the words. This problem can be solved,
however, by using dictionaries. Moreover, the results provided by the native
speakers were much more accurate and they responded faster. We suggest using
the lists of forbidden words; such lists impose users, specially non-English-spoken
players, to use more and more sophisticated vocabularies, otherwise they stop
getting points at some time. Hence, they have to learn new vocabularies.</p>
          <p>The next part of our experiment was to evaluate the second phase (see Section
2.2), in which each person was asked a set of questions related to the common
sense facts. Again we used the same words: tree and book. For the word tree,
there were just three questions that let the players to successfully complete a
round: it is a kind of a plant ; it has at least 1 height ; it could be either oak or
larch. Five more rounds introduced additionally two more facts to our knowledge
base: it is disjoint with animals; and it has at least 1 root. The same example for
the word book resulted in three common sense facts in five rounds: it has at least
1 edition; it has at least 1 language; it could be either hard-copy or electronic.
Five more rounds resulted in two new statements: it has at least 1 author ; and
it has at least 1 title. Again we note that more and more rounds are necessary
to improve the quality of the ontologies.
Rounds Accepted Words
5 Color, Height, Age</p>
          <p>Rejected Words
Bark, Animals, Location, Kind, Fruit,
Root, Branches, Green, Flower, Species,
Width, Status, Leaves Falling, Seeds,</p>
          <p>Continued on next page</p>
          <p>Table 2 – continued from previous page
Rounds Accepted Words Rejected Words
10</p>
          <p>Kind
Color, Height, Age, Bark, Animals, Location, Kind, Fruit,
Leaves, Species Root, Branches, Green, Flower, Width,</p>
          <p>Status, Type, Name, Leaves Falling,</p>
          <p>Seeds, Kind</p>
          <p>Rejected Words
Pages, Chapters, Words, Paragraph, Index,
Foreword, Thickness, audience age, ISBN,
Wtext, abstract, color
Pages, Chapters, Words, Paragraph, Index,
Foreword, Thickness, audience age, ISBN,
text, abstract, color, cover type, domain
5</p>
          <p>Discussions
The aim of the OntoPair game is to build simple ontologies for different objects
that are located in images or even text-based objects in a short time. Our main
concern is that the game should be entertaining to encourage people to play it.
For this reason, we should avoid complex domains to be played. Some
complicated concepts like business categorizations can be out of scope of this game, as
these complicated domains may make the game boring and players will not come
back again. The other point is that the generated ontologies may not contain
all information regarding a domain, as the players are very ordinary people and
not from Semantic Web domain. This is the main advantage of the game, as it
cleverly uses people from different domains to help the Semantic Web domain
experts and scientists. However, we believe that ontologies will be complicated
after each play.</p>
          <p>Even though we proposed that the players should be randomly paired, there
exist some cheating potentials; players could agree to login at the same time
to be paired together and maliciously annotate the objects. To avoid this case,
based on previous plays, at some random times, we propose presenting specific
images or texts that we know exactly the properties of objects in them and if
we notice that the players are not playing honestly, we let them play as long
as they want. The same solution is foreseen for second phase of the game. As
we mentioned, to increase certainty, we only assign properties and statements
to objects, if and only if a certain amount of players agreed upon that. As an
example, if only two players agreed upon a car has wing among other players, we
give a low ranking to wing and after filtering the properties using a threshold,
we omit the wing.</p>
          <p>Statistics and our experiences show that word guessing games are played by
many people as these games are entertaining. Many people from non-English
speaking countries play these game to improve their English.</p>
          <p>
            For evaluating the generated ontologies, the game can be played in single
mode and the single player will play against already-generated ontologies. If
generated ontologies contain sufficient knowledge, the guesser should be able to
guess the correct words, otherwise a low ranking will be assigned to the generated
ontology. The other approach towards evaluating OntoPair is comparing the
generated ontologies with ontologies that have been created by domain experts;
e.g. we can compare two ontologies for a domain like book, one from OntoPair
repository and the other which has been generated by hand.
6
In [
            <xref ref-type="bibr" rid="ref11 ref29">11</xref>
            ], the authors present an approach for building ontologies using a game
called OntoGame. They use Wikipedia articles as conceptual entities, present
them to the players, and have the users judge the ontological nature and find a
common abstractions for a given entry [
            <xref ref-type="bibr" rid="ref11 ref29">11</xref>
            ]. Our approach is different, as we do
not build a tree structure for objects. In two phases, we gather properties and
cardinalities plus different instances of an object.
          </p>
          <p>
            There exist also some efforts towards building a knowledge base by means of
computer-based games. These games have been designed mostly for two players.
The ESP game [
            <xref ref-type="bibr" rid="ref25 ref7">7</xref>
            ] tries to annotate images by enforcing players to come up with
the exact objects located in images. Peekaboom [
            <xref ref-type="bibr" rid="ref27 ref9">9</xref>
            ] is another game which tries
to come up with approximate location of objects in an image. Verbosity [
            <xref ref-type="bibr" rid="ref26 ref8">8</xref>
            ] is a
word guessing game which composes of two players: narrator and guesser; The
former should guide the latter to come up with the word that he is looking for
by using some fixed templates for this purpose. Common Consensus [
            <xref ref-type="bibr" rid="ref22 ref4">4</xref>
            ] is very
similar to Verbosity [
            <xref ref-type="bibr" rid="ref26 ref8">8</xref>
            ], but it has its own templates which begin mostly with
Wh* questions. Phetch [
            <xref ref-type="bibr" rid="ref12 ref30">12</xref>
            ] is another game which is composed of two players:
narrator and guesser; the narrator should give guesser some keywords to help
him/her to select the right image from a list of images. In other words, Phetch’s
main goal is finding a specific image in a bunch of similar images.
          </p>
          <p>
            There exist also some other efforts in this general direction mostly for
designing single player games. Labelme [
            <xref ref-type="bibr" rid="ref2 ref20">2</xref>
            ] is one example which assigns you an
image for annotation. Cyc 1 is an artificial intelligence project that attempts
to assemble a comprehensive ontology and database of everyday common sense
1 http://www.cyc.com/
knowledge, with the goal of enabling AI applications to perform human-like
reasoning [
            <xref ref-type="bibr" rid="ref14 ref32">14</xref>
            ]. Cyc offers a web-based game called FACTory 2 which gives the single
player several sophisticated common sense facts regarding different domains and
the player should mark them as true or false statements in a short time period.
          </p>
          <p>
            At the beginning of 1980s Wille [
            <xref ref-type="bibr" rid="ref18 ref36">18</xref>
            ] initiated his work on a theory known
as Formal Concept Analysis. The aim of the theory is to analysis data and
identify conceptual structures among data sets. This work rapidly expanded
several years later and has been successfully applied for some specific domains,
e.g. bio-medicine [
            <xref ref-type="bibr" rid="ref1 ref19">1</xref>
            ]. However, such an approach often requires domain experts
to approve the results.
7
          </p>
          <p>Conclusion and Future Works
We have presented our work towards OntoPair, a game that uses Collective
Intelligence for building OWL-based ontologies. OntoPair collects properties and
common sense facts about an object in an entertaining environment and builds
simple domain ontologies. We described how players should compete and how
computers should process and integrate results. We also performed a simple
experiment showing now our knowledge base grows. Our prototype
implementation is still being implemented3 and it needs some work in the data and user
management areas. Moreover, the future work will include a reputation model
that will give more impact to users who are given high esteem. Linking different
ontologies together can be also considered as next phase. As an example, if we
build an ontology for a wheel, and we have a common sense fact indicating that
a car has wheel, we may link the car and wheel ontologies. Furthermore, we
would like to perform more experiments to research how long would it take for
a domain expert and ontology engineer to build an equivalent ontology. We also
would like to test OntoPair in more specific domains.</p>
          <p>Acknowledgments. The authors would like to thank Dr. Axel Polleres for his
valuable comments. This work is partially supported by Ecospace (Integrated
Project on eProfessional Collaboration Space) project: FP6-IST-5-35208, Lion
project supported by Science Foundation Ireland under Grant No.
SFI/02/CE1/I131, and Enterprise Ireland under Grant No. *ILP/05/203*.
2 http://207.207.9.186/
3 http://sourceforge.net/projects/OntoPair
Semantically Enhanced Webspace for Scienti c</p>
          <p>Collaboration
Daniel Harezlak1, Piotr Nowakowski1, and Marian Bubak1;2
1 Academic Computer Center CYFRONET AGH
ul. Nawojki 11, 30-950 Krakow, Poland
2 Institute of Computer Science AGH
al. Mickiewicza 30, 30-059 Krakow, Poland</p>
          <p>d.harezlak@cyf-kr.edu.pl
Abstract. The paper presents an approach to constructing a collective
Web-based system for knowledge management. The work refers to the
concepts and ideas widely promoted by modern web communities, such as
user-created and user-annotated content or reliable search mechanisms.
Also, formal ways such as ontology-to-model dependencies within
collective knowledge are used to build the proposed system. The main focus
of this e ort is directed towards scienti c communities in which large
amounts of experimental data need to be classi ed and veri ed. For this
purpose an enhanced set of available Web tools needs to be assembled
and made available as a uni ed system.</p>
          <p>
            Key words: semantic models, web management, application plan,
collaborative research
1
The need to represent knowledge by a language that both people and computers
can comprehend is obvious and has been proven almost a decade ago [
            <xref ref-type="bibr" rid="ref1 ref19">1</xref>
            ]. Since
then signi cant e ort was invested in combining the formalisms of descriptions
that can be parsed by computers with free-text content published by people all
over the world, creating the new notion of the Semantic Web. According to the
survey [
            <xref ref-type="bibr" rid="ref2 ref20">2</xref>
            ] the Semantic Web is increasing its momentum by expanding in the
areas of Internet computing such as trade, business and travel, not to mention
the science domain. Currently we observe that the technologies and tools used for
knowledge representation and management are becoming more stable and thus
models and services are being proposed [
            <xref ref-type="bibr" rid="ref21 ref22 ref3 ref4">3, 4</xref>
            ] to realize the vision of large-scale
knowledge integration.
          </p>
          <p>This paper focuses on scienti c aspects of the Semantic Web, especially on
knowledge- and data-intensive applications, which need to better bene t from
the possibilities that become available through the manifestation of the
Semantic Web and its extensions. The basic challenge is to combine the collaborative
and global methods of using Web resources with individual and
geographicallyscattered research activities. Many modern approaches try to exploit the
techniques available in social Web management such as tagging, ranking or editing
Web content by all users. However, more formal mechanisms are required for
scienti c purposes. This goal can be supported by applying a strict semantic
framework to the way in which Web research is conducted. That is why we
propose a solution that incorporates a semantic layer into the available Web
management routines to facilitate scienti c research.</p>
          <p>
            A need for such environment was observed in the ViroLab project [
            <xref ref-type="bibr" rid="ref23 ref5">5</xref>
            ] which
develops a virtual laboratory [
            <xref ref-type="bibr" rid="ref24 ref6">6</xref>
            ] to facilitate medical knowledge discovery and
provide decision support for HIV drug resistance [
            <xref ref-type="bibr" rid="ref25 ref7">7</xref>
            ]. Three groups of users have
been identi ed: clinicians using decision support systems for drug ranking,
experiment developers who plan complex biomedical simulations, and experiment
users who apply prepared experiments (scripts) [
            <xref ref-type="bibr" rid="ref26 ref8">8</xref>
            ]. An experiment is a kind of
processing which may involve acquiring input data from distributed resources,
running remote operations on this data, and storing results in a dedicated space,
which should not only limit its functionality to the medical disciplines but extend
into other areas of science.
          </p>
          <p>The following section contains current achievements in the Semantic Web
area. Subsequently, a list of requirements for the proposed solution is presented.
The following two sections contain the architecture and proposals of semantic
enhancements, followed by current implementation status and a summary with
a future workplan.</p>
          <p>
            This work tries to go beyond the present state in building scienti c web
communities by proposing a system which covers traditional computation
infrastructures with lightweight yet reliable and oriented on research web
interfaces supporting knowledge management. In principle, it builds upon existing
achievements of Semantic Web, however, a novel approach of managing
semantic descriptions by web community members is introduced. This requires new
combinations of tools for managing semantic metadata and social techniques of
editing web content.
2
Modern systems in which semantic descriptions are used to represent
knowledge generally apply tested and reliable languages, such as OWL [
            <xref ref-type="bibr" rid="ref27 ref9">9</xref>
            ], which is
based on an older RDF speci cation [
            <xref ref-type="bibr" rid="ref10 ref28">10</xref>
            ]. Another standard used by a signi cant
group of people is WSMO [
            <xref ref-type="bibr" rid="ref11 ref29">11</xref>
            ], which provides methods to semantically describe
Web services. A problem, however, arises when di erent groups of researchers
try to create descriptions of the same phenomena or elements of reality,
resulting in inconsistencies when such descriptions are merged. This requires manual
alignment, which can be very time-consuming and ine cient. In order to e
ciently build ontologies, a semiautomatic tool is required to provide feedback on
preexisting descriptions and enable scientists to further build upon them, thus
ensuring coherency.
          </p>
          <p>
            It is easy to observe that the social Web has evolved into a global
collaboration space where people from all over the world exchange experience using
systems such as Facebook [
            <xref ref-type="bibr" rid="ref12 ref30">12</xref>
            ] or Flickr [
            <xref ref-type="bibr" rid="ref13 ref31">13</xref>
            ]. This way of collaboration has made
the Web an interesting tool for scienti c communities, with which to exchange
research results and knowledge. Several attempts were undertaken to bene t
from those ideas, resulting in applications like [
            <xref ref-type="bibr" rid="ref14 ref32">14</xref>
            ] and new trends in
semantic computing [
            <xref ref-type="bibr" rid="ref15 ref33">15</xref>
            ]. These attempts, however, still lack general acceptance and
stability. Nevertheless, several environments are already available and are being
used by minor groups. For example, myExperiment [
            <xref ref-type="bibr" rid="ref16 ref34">16</xref>
            ], currently in its beta
testing phase, is a successor to well-accepted work ow management systems
such as Taverna [
            <xref ref-type="bibr" rid="ref17 ref35">17</xref>
            ] or BIOSteer [
            <xref ref-type="bibr" rid="ref18 ref36">18</xref>
            ]. The project delivers a Web-based system
for sharing work ows among community members; however, the infrastructure
does not provide features that allow work ow execution and result management.
3
          </p>
          <p>
            Requirements
In order to satisfy potential researchers, any new system should ease their work.
Therefore, basic requirements should be identi ed rst. Below we present a list
which attempts to formalize the process in which research is conducted. In
particular, it is assumed that each type of supported scienti c research can be aided
either by applying a computer system to conduct a virtual experiment (such as
a simulation) or by presenting the results in a digital format. Following is a list
of basic requirements for a knowledge Web management system.
{ application plan storage - The notion of an application plan exists in various
domains of science and can be described as a list of steps necessary to achieve
a certain result. There are many ways to represent such a list. It can be
accomplished either by building a work ow (e.g. with the BPEL [
            <xref ref-type="bibr" rid="ref37">19</xref>
            ] notation)
or by using a script (with any available scripting language). The requirement
is to provide a facility for application storage that can be accessed by
authorized users. In this way published applications can be discovered, reused,
assessed and improved by other scientists.
{ managing application execution - For the application plan execution to be
possible, an underlying infrastructure has to be deployed and a proper
application plan execution engine needs to be set up. The whole process of
application execution has to be visualized to the user and, if necessary,
intermediate results should be delivered.
{ managing scienti c results - The outcome of a research activity should be
represented by a result stored in a dedicated database. The results should
be properly annotated and classi ed, available for other scientists for veri
cation purposes.
{ collaborating with other scientists - The system should provide
collaboration tools enabling scientists to discover their work, properly restricted by
security and copyright agreements. It also should be convenient to exchange
experience and validate other's work within one system.
          </p>
          <p>The presented list of requirements should be supported by a semantic model
that facilitates all the functionalities which are to be provided by the proposed
system.</p>
          <p>Another non-functional requirement is to separate the processes of
application development and conducting research. On the one hand developers do not
want to be laden with the semantics of a certain research area but only restrict
to e.g. data format, computation optimization, etc. On the other hand scientists
want to focus only on the research without knowing the speci cs of the actual
implementation. This requires a certain separation layer provided by the
experiment plan. the common parts between the mentioned groups are only notions
of experiment plan, input data and experiment result. Developers write
experiments together with underlying services, components, etc., wchich require input
data and produce results (of course the format of the data is to be agreed
between those two groups). The researchers execute the experiments, validate and
classify the data being able to manage the semantic layer.</p>
          <p>One last requirement that was identi ed is the cross-disciplinary cooperation
of researchers. Creating a global and ultimate ontology seems to be an impossible
challange. However, it might be possible to nd intersections between them and
bene t from what others work on. The approach in the proposed system is to
make all the semantic metadata available to all participants. In order to do
that an advanced editor is required to assist the researchers in the process of
managing the metadata.
4.1
In Fig. 1 the basic architecture is presented. The system is divided into four
layers. At the bottom, the resource layer consists of services and data sources
which are used to build application plans using work ow or script notations
that provide some level of abstraction. In the same layer the Metadata Store
and the Application Repository are deployed and used to archive semantic data
and application plans respectively. The last two components are accessed by
the Web application layer (shown in green) directly. The next, yellow layer is
the middleware which provides an abstraction over the low-level resources and
ensures uni ed access to the variety of technologies that implement data sources
and computational services. In this way access to data and services is seamlessly
woven into the notation. The Application Execution Engine also maintains the
state of the applications during execution.</p>
          <p>The third layer, representing Web applications, contains two modules, namely
the Metadata Engine module and the Execution Client module. The rst module
is the one responsible for managing semantic descriptions available in the system.
It also constitutes a lter and a tool that helps users manage the semantic
content they provide or browse. Based on the semantic model presented in the
next section users are able to:
{ import their own semantic descriptions by semi-automatically aligning and
mapping them against existing ones,</p>
          <p>Reasearch groups
researc h, share explore, publish
use
query
Streamed</p>
          <p>Data
Stores</p>
          <p>run
Application
Execution</p>
          <p>Engine
use
invoke
query</p>
          <p>browse
Metadata</p>
          <p>Store</p>
          <p>DBs</p>
          <p>Services</p>
          <p>Fig. 1. Basic components of the proposed system.
{ browse the existing knowledge by conveniently searching through existing
ontology triples,
{ quickly obtain application plans, results or publications of interest by
providing key words (the whole knowledge space is tagged and annotated),
{ tag and annotate the existing objects in the knowledge space.</p>
          <p>
            The second module - Execution Client - is responsible for communication
with the application execution engine and keeping the users updated with the
current execution status using AJAX-oriented techniques (e.g. implemented with
the GWT toolkit [
            <xref ref-type="bibr" rid="ref38">20</xref>
            ]).
4.2
The Metadata Engine is the main component which provides the reasoning
functionality over the ontologies built within the system. It covers the low-level
Metadata Store and exposes convenient methods to manage the knowledge structure.
          </p>
          <p>In Fig. 2 a detailed architecture of the Metadata Engine is presented. It
contains a client that enables it to access the underlying metadata store and
facilitates the use of the query language used by the store. The deduction module
n
E
a
t
a Remote Asynchronous/
d QLuocearlyMInettaedrfaatcae
a
t
e
M</p>
          <p>Client
Knowledge</p>
          <p>Port</p>
          <p>Presentation</p>
          <p>Modules</p>
          <p>Server-side
complex deduction</p>
          <p>module
RDQL, etc.</p>
          <p>Server
Knowledge</p>
          <p>Port</p>
          <p>Client</p>
          <p>Web
Browser
Metadata</p>
          <p>Store</p>
          <p>Fig. 2. Internal architecture of Metadata Engine.
is divided into two parts. For simple queries for which response times should be
short the part on the client-side is used. It communicates with the client through
an asynchronous channel according to the techniques used in web client-server
communication models (built over standard request-response model). The calls
are made directly by the visual components which concludes with their visual
state update. If the queries are more complex then the deduction module on the
server-side is used. To the visual components this however is transparent with
only longer repsonse times.
5
In Fig. 3 a sample of the ontology model is presented. This model is used as the
basis for the Metadata Engine module to manage the collaboration space.</p>
          <p>The model consists of three parts:
{ Science Domain - (blue) - This part of the semantic description is extendable
by users. This ensures that the model remains dynamic and, when required,
users may add custom ontological descriptions to existing ones. The process
is semi-supervised by the system in order to maintain coherency.
{ Basic Model - (orange) - This model is the core of the application and its
basic models. It assumes (in accordance with social Web content management)
that every item within the collaboration space may be tagged or annotated.
This enables the space to be enhanced by a quick search mechanism or by
building a tag cloud (used for space browsing).
{ Application model - (green) - This ontology model allows the Metadata
Engine to keep track of the content managed by users. In particular, users
are able to submit speci c queries that navigate to accurate pieces of data
stored in the collaboration space (e.g. list all publications that describe the
outcomes of a particular application plan, etc.)</p>
          <p>The presented model is just a proposition, showing how the nal
implementation could look and it remains a subject of ongoing research. It is also possible
to test several di erent models in di erent research contexts.
5.2</p>
          <p>Role and Ontology Management
In order to ensure hierarchy in the process of managing and building the
ontologies proper groups need to be modelled with certain permissions. Also, a way of
assessing the quality of the ontologies is required to introduce formal models of
the management process.</p>
          <p>Figure 4 depicts a sample structure of such ontology. the main Object node
is assigned the is editable by relation which speci es what roles are permitted to
User</p>
          <p>Role
Node
Quality</p>
          <p>Object
Community</p>
          <p>Node</p>
          <p>Fig. 4. Role management dependency semantics.
edit a given node. All Role nodes are referred by User nodes which creates the
authorization net in the proposed model.</p>
          <p>
            To enable users with the possibility of extending the current ontology graph
a Community Node is introduced. This node is inherited by all the nodes
created by community members and in the process of collaborative cooperation
of scienti c communities it is assessed and the quality information is stored in
the individuals of the Node Quality. The quality will be measured by analyzing
statistics of use of such knowledge node(e.g. the more users use and cite a given
ontology node the higher rank it has). Further improvements of such approach
will categorize the semantic descriptions into approved and validated and those
still being unassessed. Hopefully, this will lay ground for building community
ontologies across di erent science domains. The model itself may be changed
while the system is working.
6
Currently the presented model is being implemented within the virtual
laboratory supporting the scripting approach to representing application plans [
            <xref ref-type="bibr" rid="ref24 ref6">6</xref>
            ]. The
application execution engine is already [
            <xref ref-type="bibr" rid="ref39">21</xref>
            ] operational and capable of running
test application plans. Simple ontology models have been built; however, they
still require user assessment in order to be improved.
          </p>
          <p>With respect to the web application layer a prototype of the user interface
was built and a screenshot is depicted in Fig. 5.</p>
          <p>The interface is divided into three parts:
{ application management - In this widget the user is able to browse the
collaboration space in search for application plans of interest. The search is</p>
          <p>The overlapping window in the middle is displayed as popup and in this
case is used to show the application plan script. Each application plan may be
supplied with a license regarding its usage restrictions.
7</p>
          <p>Conclusions and Future Work
This paper presents a semantic Web-based approach to constructing a
scienti c collaboration space. The solution combines social Web routines with the
formalisms of semantic content descriptions to facilitate the process of on-line
research. Main improvements of the approach include integration of the
application runtime system with result management and adoption of widely-used Web
content management techniques in the area of scienti c research.</p>
          <p>At present the ViroLab virtual laboratory already integrates biomedical
information related to viruses (proteins and mutations), patients (viral load) and
literature (drug resistance); it enables to plan and run experiments
transparently on distributed resources. Di erent experiments from the virology domain
are executable, such as: from virus genotype to drug resistance interpretation,
querying historical and provenance information about experiments, assisting a
virologist with the Drug Resistance System, a simple data mining with
classication. Further work will extend the list and explore re-usability in di erent
science disciplines.</p>
          <p>Future plans include the extension of the semantic model used for building
the prototype and extending the user community to test and assess the approach.
The aim is to bene t from the ideas brought by the Semantic Web trends and
extend the present solutions in the area of community-driven research to make
the process more reliable and e cient.</p>
          <p>Acknowledgments. This work is partly funded by the European Commission
under the ViroLab IST-027446 and the IST-2002-004265 Network of Excellence
CoreGRID projects.</p>
          <p>References</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Franz</given-names>
            <surname>Baader</surname>
          </string-name>
          , Bernhard Ganter, Baris Sertkaya, and
          <string-name>
            <given-names>Ulrike</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Completing description logic knowledge bases using formal concept analysis</article-title>
          .
          <source>In Manuela M. Veloso, editor, IJCAI</source>
          , pages
          <fpage>230</fpage>
          -
          <lpage>235</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bryan</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Russell</surname>
          </string-name>
          , Antonio Torralba, Kevin P. Murphy, and William T. Freeman.
          <article-title>LabelMe: a database and web-based tool for image annotation</article-title>
          .
          <source>In MIT AI Lab Memo AIM-2005-025</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Dan</given-names>
            <surname>Brickley</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.V.</given-names>
            <surname>Guha</surname>
          </string-name>
          .
          <article-title>Resource Description Framework (RDF) Schema Specification</article-title>
          . http://www.w3.org/TR/rdf-schema/,
          <year>February 2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Henry</given-names>
            <surname>Lieberman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Dustin</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Alea</given-names>
            <surname>Teeters</surname>
          </string-name>
          .
          <article-title>Common Consensus: A Webbased Game for Collecting Commonsense Goals</article-title>
          .
          <source>In Workshop on Common Sense for Intelligent Interfaces</source>
          ,
          <source>ACM International Conference on Intelligent User Interfaces (IUI-07)</source>
          , Honolulu, Hawaii, USA,
          <year>2007</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Google</given-names>
            <surname>Inc</surname>
          </string-name>
          . Google Image Labeler. http://images.google.com/imagelabeler/,
          <year>2007</year>
          . Online; accessed 3-May-
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Pierre</given-names>
            <surname>Levy</surname>
          </string-name>
          .
          <source>Collective Intelligence</source>
          . Plenum Publishing Corporation,
          <year>January 1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Luis von Ahn, and
          <string-name>
            <given-names>Laura</given-names>
            <surname>Dabbish</surname>
          </string-name>
          .
          <article-title>Labeling images with a computer game</article-title>
          .
          <source>In CHI '04: Proceedings of the 2004 conference on Human factors in computing systems</source>
          , pages
          <fpage>319</fpage>
          -
          <lpage>326</lpage>
          . ACM Press,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Luis von Ahn, Mihir Kedia, and
          <string-name>
            <given-names>Manuel</given-names>
            <surname>Blum</surname>
          </string-name>
          .
          <article-title>Verbosity: a game for collecting common-sense facts</article-title>
          .
          <source>In CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems</source>
          , pages
          <fpage>75</fpage>
          -
          <lpage>78</lpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. Luis von Ahn, Ruoran Liu, and
          <string-name>
            <given-names>Manuel</given-names>
            <surname>Blum</surname>
          </string-name>
          .
          <article-title>Peekaboom: a game for locating objects in images</article-title>
          .
          <source>In CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems</source>
          , pages
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Sean</surname>
            <given-names>Bechhofer</given-names>
          </string-name>
          , and Frank van Harmelen,
          <string-name>
            <given-names>and Jim</given-names>
            <surname>Hendler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ian</given-names>
            <surname>Horrocks</surname>
          </string-name>
          , and
          <string-name>
            <surname>Deborah L. McGuinness</surname>
            ,
            <given-names>and Peter F.</given-names>
          </string-name>
          <string-name>
            <surname>Patel-Schneider</surname>
          </string-name>
          ,
          <article-title>and Lynn Andrea Stein. OWL Web Ontology Language Reference</article-title>
          . http://www.w3.org/TR/owlref/,
          <year>February 2004</year>
          . Online; accessed 2-May-
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Siorpaes</surname>
            <given-names>Katharina</given-names>
          </string-name>
          , and Martin Hepp.
          <article-title>OntoGame: Towards Overcoming the Incentive Bottleneck in Ontology Building</article-title>
          .
          <source>In 3rd International IFIP Workshop On Semantic Web and Web Semantics (SWWS '07)</source>
          , co
          <article-title>-located with OTM Federated Conferences</article-title>
          , Vilamoura, Portugal, pages
          <fpage>1222</fpage>
          -
          <lpage>1232</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Luis</surname>
            von Ahn, Shiry Ginosar, Mihir Kedia, Ruoran Liu, and
            <given-names>Manuel</given-names>
          </string-name>
          <string-name>
            <surname>Blum</surname>
          </string-name>
          .
          <article-title>Improving accessibility of the web with a computer game</article-title>
          .
          <source>In CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems</source>
          , pages
          <fpage>79</fpage>
          -
          <lpage>82</lpage>
          , New York, NY, USA,
          <year>2006</year>
          . ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          . Captcha - wikipedia,
          <source>the free encyclopedia</source>
          ,
          <year>2007</year>
          . [Online; accessed 14-December-2007].
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          .
          <article-title>Cyc - wikipedia, the free encyclopedia</article-title>
          . http://en.wikipedia.org/w/index.php?title=
          <source>Cyc&amp;oldid=125786119</source>
          ,
          <year>2007</year>
          . [Online; accessed 7-May-2007].
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          .
          <article-title>Guessing game - wikipedia, the free encyclopedia</article-title>
          . http://en.wikipedia.org/w/index.php?title=Guessing game&amp;
          <source>oldid=116214370</source>
          ,
          <year>2007</year>
          . Online; accessed 6-May-
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          .
          <article-title>Human-based computation - wikipedia, the free encyclopedia</article-title>
          . http://en.wikipedia.org/w/index.php?title=Humanbased computation&amp;
          <source>oldid=122965665</source>
          ,
          <year>2007</year>
          . [Online; accessed 7-May-2007].
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wikipedia</surname>
          </string-name>
          . Collective intelligence - wikipedia,
          <source>the free encyclopedia</source>
          ,
          <year>2008</year>
          . [Online; accessed 17-March-2008].
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>R.</given-names>
            <surname>Wille</surname>
          </string-name>
          .
          <article-title>Restructuring lattice theory: An approach based on hierarchies of concepts</article-title>
          .
          <source>In Ordered Setsand in I. Rivals (Ed.)</source>
          , volume
          <volume>23</volume>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          1.
          <string-name>
            <surname>Chandrasekaran</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Josephson</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Benjamins</surname>
            ,
            <given-names>V.R.</given-names>
          </string-name>
          :
          <article-title>What are ontologies, and why do we need them</article-title>
          ?
          <source>IEEE Intelligent Systems</source>
          <volume>14</volume>
          (
          <issue>1</issue>
          ) (January/
          <year>February 1999</year>
          )
          <volume>20</volume>
          {
          <fpage>26</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cardoso</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The semantic web vision: Where are we</article-title>
          ?
          <source>IEEE Intelligent Systems</source>
          <volume>22</volume>
          (
          <issue>5</issue>
          ) (September/
          <year>October 2007</year>
          )
          <volume>84</volume>
          {
          <fpage>88</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          3.
          <string-name>
            <surname>Missier</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alper</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corcho</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dunlop</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Requirements and services for metadata management</article-title>
          .
          <source>IEEE Internet Computing</source>
          <volume>11</volume>
          (
          <issue>5</issue>
          ) (September/
          <year>October 2007</year>
          )
          <volume>17</volume>
          {
          <fpage>25</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          4.
          <string-name>
            <surname>Carroll</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dickinson</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reynolds</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seaborne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilkinson</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Jena: Implementing the semantic web recommendations</article-title>
          .
          <source>Technical report, HP Labs</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Virolab</given-names>
            <surname>Consortium: ViroLab - EU IST STREP Project</surname>
          </string-name>
          <article-title>027446 (</article-title>
          <year>2008</year>
          ), http://www.virolab.org
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          6.
          <string-name>
            <surname>ACC CYFRONET AGH</surname>
          </string-name>
          <article-title>: Virolab virtual laboratory (</article-title>
          <year>2008</year>
          ), http://virolab.cyfronet.pl
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          7.
          <string-name>
            <surname>Sloot</surname>
            ,
            <given-names>P.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tirado-Ramos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altintas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bubak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boucher</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>From molecule to man: Decision support in individualized e-health (</article-title>
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gubala</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bubak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Gridspace - semantic programming environment for the grid</article-title>
          .
          <source>LNCS</source>
          <volume>3911</volume>
          (
          <year>2006</year>
          )
          <volume>172</volume>
          {
          <fpage>179</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          9. W3C:
          <article-title>Owl web ontology language (</article-title>
          <year>2004</year>
          ), http://www.w3.org/TR/owl-features
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          10. W3C:
          <article-title>Rdf: Resource description framework (</article-title>
          <year>2001</year>
          ), http://www.w3.org/RDF
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          11.
          <string-name>
            <surname>Roman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Keller, U.,
          <string-name>
            <surname>Lausen</surname>
          </string-name>
          , H., de Bruijn, J.,
          <string-name>
            <surname>Lara</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stollberg</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feier</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bussler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fensel</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Web service modeling ontology</article-title>
          .
          <source>Applied Ontology</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ) (
          <year>January 2005</year>
          )
          <volume>77</volume>
          {
          <fpage>106</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          12.
          <string-name>
            <surname>Facebook</surname>
          </string-name>
          <article-title>Team: A social utility that connects people with friends and others who work, study and live around them (</article-title>
          <year>2008</year>
          ), http://www.facebook.com
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          13.
          <string-name>
            <surname>Yahoo</surname>
          </string-name>
          <article-title>! Inc: Photo sharing web space (</article-title>
          <year>2008</year>
          ), http://www.flickr.com
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          14.
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>G.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guha</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McMullen</surname>
            ,
            <given-names>D.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mustacoglu</surname>
            ,
            <given-names>A.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pierce</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Topcu</surname>
            ,
            <given-names>A.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wild</surname>
            ,
            <given-names>D.J.:</given-names>
          </string-name>
          <article-title>Web 2.0 for grids and e-science</article-title>
          .
          <source>In: INGRID 2007 - Instrumenting the Grid</source>
          , 2nd International Workshop on Distributed Cooperative Laboratories - S.Margherita Ligure Porto no. (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          15.
          <string-name>
            <surname>Goble</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roure</surname>
          </string-name>
          , D.D.:
          <article-title>Grid 3.0: Services, semantics and society</article-title>
          .
          <source>In: Proceedings of Cracow Grid Workshop</source>
          <year>2007</year>
          , ACC CYFRONET AGH (
          <year>2008</year>
          )
          <volume>10</volume>
          {
          <fpage>11</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          16. The University of Manchester and University of Southampton: myexperiment home page (
          <year>2008</year>
          ), http://www.myexperiment.org
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          17.
          <string-name>
            <surname>Oinn</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Addis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferris</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marvin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Senger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greenwood</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carver</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glover</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , Pocock6,
          <string-name>
            <given-names>M.R.</given-names>
            ,
            <surname>Wipat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Taverna: a tool for the composition and enactment of bioinformatics work ows (</article-title>
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          18.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>T.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hashmi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cummings</surname>
            ,
            <given-names>M.P.</given-names>
          </string-name>
          :
          <article-title>Bio-steer: A semantic web work ow tool for grid computing in the life sciences (</article-title>
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          19. OASIS:
          <article-title>Web services business process execution language (</article-title>
          <year>2007</year>
          ), http://www.oasis-open.org/committees/tc home.php?wg abbrev=wsbpel
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          20. Google:
          <article-title>Google web toolkit (</article-title>
          <year>2008</year>
          ) http://code.google.com/webtoolkit
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          21.
          <string-name>
            <surname>Ciepiela</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kocot</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gubala</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malawski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasztelnik</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bubak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Gridspace engine of the virolab virtual laboratory</article-title>
          .
          <source>In: Proceedings of Cracow Grid Workshop</source>
          <year>2007</year>
          , ACC CYFRONET AGH (
          <year>2008</year>
          )
          <volume>53</volume>
          {
          <fpage>58</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>