<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Integrating Web 2.0 Data into Linked Open Data Cloud via Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eirini Giannakidou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Athena Vakali</string-name>
          <email>avakalig@csd.auth.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aristotle University of Thessaloniki Department of Informatics</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this position paper, we claim that useful information mined from social tagging systems can be published and shared as Linked Data, in order to be further exploited. The proposed approach aims: i) to be applied on any social tagging platform, and ii) utilize information that concerns the collective activity of users, not individual transactions. Such information involves topics of interests, emerging trends, events, communities around events/trends. To mine this knowledge from tagging data, a number of clustering algorithms are suggested.</p>
      </abstract>
      <kwd-group>
        <kwd>linked data</kwd>
        <kwd>web 2</kwd>
        <kwd>0</kwd>
        <kwd>tagging</kwd>
        <kwd>clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>With the advent of Web 2.0, tagging practices constitute a collective fashion of
metadata creation, which is suggestive of individual users interests, since users
tend to tag content that they ¯nd interesting. As more and more people have
supported this surge, tag data provides a rich information source to study social
patterns and emergent drifts/directions in the web user community. The main
problem though regarding this information is that it is kept isolated in a number
of proprietary systems and formats and, thus, it cannot be reused and further
exploited.</p>
      <p>
        Research carried out in the ¯eld of Semantic Web supports applications to
provide extensibility, °exibility, interoperability and reusability. To address such
issues in the tag space, many researchers claim that the application of mature
semantic web technologies (e.g. formal representation of content, formal
relations) on tagged data could add great value to the latter, as it may render a
kind of structure to them, remove ambiguities and promote re-usage of content.
A number of standards and initiatives in this direction involve SKOS [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], FOAF
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], SIOC [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and MOAT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. OpenID [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is also a step towards this
direction, by providing an authorization mode, which is currently accepted by many
sites, allowing, thus, consolidation of users' digital identities. However, despite
the fact that interoperability between tagging systems is a subject of research,
these approaches have not found, yet, widespread application and, so far, there
is no common agreement on a formal representation of tagging activity between
social tagging systems.
Lately, the Linked Data initiative developed which focuses on pragmatic
approaches that help applications to share and connect data. Speci¯cally, it
suggests to publish data online using open standards, but, also, link to other existing
datasets. In this position paper, we claim the publishing as Linked Data of
information mined from tagging data via advanced clustering methods. The proposed
approach can be applied on tagging data of any web 2.0 platform, provided that
it o®ers API access to such data. Its application is expected to enable
combination of data from di®erent data sources in a standardized manner and, thus,
allow the development of services that provide added-value to user community
across application and domain boundaries. The rest of the paper is organized
as follows. First, the current status in Linked Data and its applications is
described, concluding in a quotation of issues that have not been addressed so far.
Then, our proposal is given for publishing information from web 2.0 data as
Linked Data. Finally, the paper ends with some applications and some general
conclusions of the proposed approach.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>The Linked Data Landscape in 2010 and what is missing</title>
      <p>
        In 2006, Tim Berners-Lee presented Design Notes on how to write Linked Data
on the Web, in which he outlined 4 basic principles that should be followed and
are summarized as follows: Each resource should have an HTTP URI, so that it
can be dereferenced either by users or by agents, as well as a semantic description
which should be represented in standard formats such as RDF/XML. Further,
the use of links that associate representations of the same resource in di®erent
datasets is encouraged, to promote discovery of related information regarding the
resource at hand. Following this article, a number of research initiatives appeared
that aimed at the production of linked data out of distributed datasets in the
web, the more widely known being the Linking Open Data (LOD) project1. In
October 2007, the datasets in LOD project consisted of over two billion RDF
triples, which were interlinked by over two million RDF links. By September
2010 this had grown to 25 billion RDF triples, interlinked by around 395 million
RDF links. Moreover, a number of applications were developed in which the
aforementioned principles are followed, as described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        A detailed review on linked data initiatives and applications can be found
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As can be seen there, currently, the initiatives on linked data approaches
are roughly categorized into two groups: i) approaches that mainly reuse content
of datasets in the LOD cloud [3{5], and ii) approaches that enable associating
HTTP URIs with User Generated Content (UGC), resulting, thus, in structured
UGC [
        <xref ref-type="bibr" rid="ref6 ref7">7, 6</xref>
        ]. All such e®orts that concern publication of linked data (i.e. turning
raw web data into linked data) and their consumption in domain-speci¯c or
more generic frameworks have created a new landscape on the web map that
contains tools, technologies and datasets following these principles and keeps on
1 http://linkeddata.org/
expanding as more and more applications and sites embrace linked data. Figure
1 illustrates schematically the current type of information binding between web
2.0 and linked data lands.
      </p>
      <p>
        We will now focus on what is currently missing from the Linked Data
landscape, concerning the interaction with datasets from the Web 2.0 landscape.
Given the popularity of web 2.0 platforms and the useful information that can
be extracted from these sites, as reported in the related literature([
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ]), we
quote 2 shortcomings in the current binding interaction between these two
landscapes:
{ no mappings on the abundance of web 2.0 platforms to Linked Data (cf.
      </p>
      <p>App1 in Figure 1)
{ no mappings on data aggregation level (currently, the mappings are on
individual data level, so the \Wisdom of the Crowds " is lost)</p>
      <p>To address the aforementioned issues, we propose a clustering approach in
the next section.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Using Clustering for integrating Web 2.0 to Linked</title>
    </sec>
    <sec id="sec-4">
      <title>Data</title>
      <p>The current shared vision for the future is one of semantically-rich information
and service oriented architecture for global information systems. As individual
web 2.0 platforms are not embracing semantic tools to unambiguously de¯ne
their data, this vision is still far away from reality. Here we present an approach
for integrating web 2.0 data to linked data, and, thus, get structural
information out of raw web 2.0 data. An asset of the presented approach is that it does
not bring any burden on individual web 2.0 platforms; instead it can be applied
on each one of them, providing that the platform o®ers access to the user data.</p>
      <p>Furthermore, the proposed approach focuses on publishing as linked data not
individual pieces of data, but information that is extracted from the collective user
activity in a web 2.0 platform through clustering. The idea is given schematically
in Figure 2 and summarized in the following steps:
{ Apply clustering techniques on a web 2.0 platform data, to extract interested
patterns in the data
{ Map the terms mined in the extracted patterns to terms already in the LOD
cloud
{ Publish the mined information as linked data and create RDF links to the
mapped terms in the LOD cloud</p>
      <p>
        Clustering approaches have been heavily researched for mining useful
information - such as topics, trends, user communities around speci¯c topics and
so on - out of tagging data in web 2.0 platforms. The clustering techniques we
propose to employ during the 1st step may be either i) co-clustering algorithms
to get groups of related content with associated tags, which can be mapped to
speci¯c topics ([
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]), ii) user-based clustering to extract topics of interests related
to speci¯c user communities inside the web 2.0 platform ([
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]), or iii) time -aware
clustering to track events or how the users are attracted by particular topics
over time ([
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]).
      </p>
      <p>
        Having applied the clustering algorithm, the information that has been
extracted in each cluster should be mapped to terms already in the LOD cloud.
During this step, a semantic indexer like Sindice [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] may be employed, which
takes input terms and returns related resources in DBPedia.
      </p>
      <p>Finally, we propose publishing of this information as Linked Data, so that
it can be further exploited. This involves representation of the data into RDF
and creation of RDF links that connect the mined information with the mapped
terms already in the LOC cloud, which were found in the previous step.
4</p>
    </sec>
    <sec id="sec-5">
      <title>Vision</title>
      <p>We envisage the employment of the proposed approach in the Future Internet,
the so-called Internet of things, in a number of contexts that aim at employing
jointly information coming from sensor data of \smart" devices and information
coming from users' activities in web 2.0 sites. Publishing such information as
Linked Data allows data that refer to the same topic, but come from di®erent web
2.0 sites to be interlinked, so that the maximum span of available information
regarding a particular topic will be exploited.</p>
      <p>Figure 3 illustrates a sample scenario of use of the proposed approach in the
context of a smart city environment. Speci¯cally, we claim that the proposed
approach can be used by automatic policy makers to improve a transportation
system under special circumstances, or even to design a better mobility plan
for the city. Smart cities will become a reality, when the objects around us
become smart and begin to embody the functionality of our computers and mobile
devices (Internet of Things). Thus, there will be internet-enabled cars, taxis,
buses, cameras on the road and so-on. All such information will be forwarded
to the policy maker agent to record the current status in the roads of the city.
Additional information extracted from web 2.0 sites may signify events that are
expected to cause disruption to normal city tra±c (e.g. concerts, protests). As
a result, the moving vehicles will be aware of such events and act accordingly
(e.g. in case of a concert with high popularity, a number of taxis may decide to
go to that direction to serve the music fans).</p>
      <p>In the long run, the sensor data coming from devices may be combined with
analysis of information coming from web 2.0 users, so that new mobility plans
can be proposed (e.g. bicycle lanes, pedestrian roads). The novel aspect here is
that the proposals will be driven not only by the actual needs of the citizens as
those are recorded in the sensor data of the devices, but also by citizens' opinions
and emerging trends, as those are expressed through citizens' activities in web 2.0
sites (e.g. if a lot of people criticize the parking problem in an area, upload photos
that illustrate the problem, a policy maker could evaluate parking de¯ciencies in
the speci¯c area and recommend solutions such as additional parking, parking
pricing and management strategies). The role of Linked Data in such scenarios
is to link information that is mined by various web 2.0 sites that refer to the
same topic.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>Here we suggested a way to publish and consume linked data out of tagging
data; the presented approach can be applied on tagging data of any web 2.0
application, provided that it o®ers API access to such data. The bene¯t of such
a process is that dynamic information about a web community (such as trends,
topics of interest of user groups, etc.) becomes available to all. This information
is of particular importance to people/professions that are interested in listening
the public pulse, such as businessmen and public administration bodies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          ,
          <article-title>Linked Data Applications - The Genesis and the Challenges of Using Linked Data on the Web</article-title>
          ,
          <source>DERI Technical Report</source>
          , (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Hausenblas: Exploiting Linked Data to Build Web Applications</article-title>
          , IEEE Internet Computing (vol.
          <volume>13</volume>
          no.
          <issue>4</issue>
          ) pp.
          <fpage>68</fpage>
          -
          <lpage>73</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Softic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          ,
          <article-title>Towards Opinion Mining Through Tracing Discussions on the Web</article-title>
          ,
          <source>In: Social Data on the Web SDoW 2008 Workshop at the 7th International Semantic Web Conference Karlsruhe</source>
          , Germany: (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>G.</given-names>
            <surname>Kobilarov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Scott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Raimond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Oliver</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sizemore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Smethurst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lee</surname>
          </string-name>
          :
          <article-title>Media Meets Semantic Web - How the BBC Uses DBpedia and Linked Data to Make Connections</article-title>
          .
          <source>In 6th European Semantic Web Conference, ESWC 2009</source>
          ,pages
          <fpage>723</fpage>
          -
          <lpage>737</lpage>
          . Springer, (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>C.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Bizer: DBpedia Mobile: A Location-Enabled Linked Data Browser</article-title>
          .
          <source>In WWW 2008 Workshop: Linked Data on the Web (LDOW2008)</source>
          , Beijing, China, (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Motta</surname>
          </string-name>
          . Revyu:
          <article-title>Linking reviews and ratings into the Web of Data</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ):
          <fpage>266</fpage>
          -
          <lpage>273</lpage>
          , (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Faviki -
          <article-title>Social bookmarking tool using smart semantic Wikipedia (DBPedia) tags</article-title>
          , http://www.faviki.com
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>E.</given-names>
            <surname>Giannakidou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Koutsonikola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vakali</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Kompatsiaris: "Co-clustering Tags and Social Data Sources"</article-title>
          ,
          <source>In Proc. 9th International Conference On Web-Age Information Management (WAIM</source>
          '
          <year>2008</year>
          ), IEEE Computer Society, pp
          <fpage>317</fpage>
          -
          <lpage>324</lpage>
          , Zhangjiajie, China (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>V.</given-names>
            <surname>Koutsonikola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vakali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Giannakidou</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          <article-title>Kompatsiaris: "Clustering Users of a Social Tagging System: A Topic and Time Based Approach"</article-title>
          <source>In Proc. of the 10th International Conference on Web Information Systems Engineering (WISE</source>
          <year>2009</year>
          ), Springer Verlag, pp
          <fpage>75</fpage>
          -
          <lpage>86</lpage>
          , Poznan, Poland (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. E.
          <string-name>
            <surname>Giannakidou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Koutsonikola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Vakali</surname>
            and
            <given-names>I.</given-names>
          </string-name>
          <article-title>Kompatsiaris: "Exploring Temporal Aspects in User-Tag Co-Clustering"</article-title>
          ,
          <source>In Proc. of the 11th International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS</source>
          <year>2010</year>
          ),
          <source>Desenzano del Garda</source>
          ,
          <source>Italy</source>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Passant</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          <article-title>Laublet: Meaning Of A Tag: A collaborative approach to bridge the gap between tagging and Linked Data</article-title>
          ,
          <source>in Proceedings of the WWW 2008 Workshop Linked Data on the Web (LDOW2008)</source>
          , Beijing, China, (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mike</surname>
            <given-names>Graves</given-names>
          </string-name>
          , Adam Constabaris, Dan Brickley:
          <article-title>FOAF: Connecting People on the Semantic Web</article-title>
          , doi = 10.1300/J104v43n03 11, In: Cataloging Classi¯
          <source>cation Quarterly</source>
          , Vol.
          <volume>43</volume>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Alistair</surname>
            <given-names>Miles</given-names>
          </string-name>
          , Brian Matthews, Michael Wilson, and Dan Brickley:
          <article-title>SKOS core: simple knowledge organisation for the web</article-title>
          .
          <source>In Proceedings of the 2005 international conference on Dublin Core and metadata applications: vocabularies in practice (DCMI '05)</source>
          . Dublin Core Metadata Initiative ,
          <source>Article</source>
          <volume>1</volume>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. John G. Breslin, Stefan Decker, Andreas Harth, and
          <string-name>
            <given-names>Uldis</given-names>
            <surname>Bojars</surname>
          </string-name>
          .
          <article-title>SIOC: an approach to connect web-based communities</article-title>
          .
          <source>Int. J. Web Based Communities 2</source>
          , 2
          <fpage>133</fpage>
          -
          <lpage>142</lpage>
          . DOI=
          <volume>10</volume>
          .1504/IJWBC.
          <year>2006</year>
          .010305 http://dx.doi.org/10.1504/IJWBC.
          <year>2006</year>
          .
          <volume>010305</volume>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. David Recordon and
          <string-name>
            <given-names>Drummond</given-names>
            <surname>Reed</surname>
          </string-name>
          .
          <source>OpenID 2</source>
          .
          <article-title>0: a platform for usercentric identity management</article-title>
          .
          <source>In Proceedings of the second ACM workshop on Digital identity management (DIM '06)</source>
          . ACM, New York, NY, USA,
          <fpage>11</fpage>
          -
          <lpage>16</lpage>
          . DOI=
          <volume>10</volume>
          .1145/1179529.1179532 http://doi.acm.
          <source>org/10</source>
          .1145/1179529.1179532 (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Andreas</surname>
            <given-names>Hotho</given-names>
          </string-name>
          ,
          <article-title>Robert Ja"schke, Christoph Schmitz, Gerd Stumme: Trend Detection in Folksonomies</article-title>
          .
          <source>In Proceedings of the First International Conference on Semantics And Digital Media Technology (SAMT)</source>
          , Vol.
          <volume>4306</volume>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>70</lpage>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Peter Mika: Ontologies Are Us</surname>
          </string-name>
          : A Uni¯
          <article-title>ed Model of Social Networks and Semantics</article-title>
          .
          <source>International Semantic Web Conference</source>
          , pp:
          <fpage>522</fpage>
          -
          <lpage>536</lpage>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Eyal</surname>
            <given-names>Oren</given-names>
          </string-name>
          , Renaud Delbru, Michele Catasta, Richard Cyganiak, Holger Stenzhorn, and Giovanni Tummarello:
          <article-title>Sindice.com: a document-oriented lookup index for open linked data</article-title>
          .
          <source>Int. J. Metadata Semant. Ontologies 3</source>
          , 1
          <fpage>37</fpage>
          -
          <lpage>52</lpage>
          . (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>