=Paper=
{{Paper
|id=Vol-369/paper-2
|storemode=property
|title=Weaving SIOC into the Web of Linked Data
|pdfUrl=https://ceur-ws.org/Vol-369/paper01.pdf
|volume=Vol-369
|dblpUrl=https://dblp.org/rec/conf/www/BojarsPCB08
}}
==Weaving SIOC into the Web of Linked Data==
Weaving SIOC into the Web of Linked Data
Uldis Bojārs Alexandre Passant
Digital Enterprise Research Electricité de France, R&D
Institute Clamart, France
National University of Ireland, & LaLIC, Université
Galway Paris-Sorbonne
uldis.bojars@deri.org Paris, France
alexandre.passant@edf.fr
Richard Cyganiak John Breslin
Digital Enterprise Research Digital Enterprise Research
Institute Institute
National University of Ireland, National University of Ireland,
Galway Galway
richard@cyganiak.de john.breslin@deri.org
ABSTRACT itself is built in a collaborative way, since it involves many
Social media sites can act as a rich source of large amounts partners and welcomes any suggestions on its SIOC-DEV
of data by letting anyone easily create content on the Web. mailing list workgroup1 , and its work has been recently pub-
The SIOC ontology and tools developed for it allow us to ex- lished as a W3C member submission2 .
press this information as interlinked RDF data. This paper
describes approaches for making this Social Web information This paper describes how Social Web data, described in
an integral part of the Web of Linked Data. SIOC, can be further weaved together with other kinds of
linked data. We describe the structure of SIOC data and
1. INTRODUCTION give a description of different approaches to interlink SIOC
By envisioning a Web of Data, in addition to the Web of data with other sources of linked data. Finally, we conclude
Documents, the Semantic Web provides a way to represent the paper with some interesting uses of such data.
real-world data, as well as virtual objects in a uniform man-
ner. The Social Web can also benefit from a common way 2. SIOC DATA
to link and represent content created through online com- The SIOC implementations list3 contains around 35 appli-
munities and social media websites. cations for creating and using SIOC data. By installing
relevant SIOC export plugins, online community sites can
The Web currently uses untyped HTML hyperlinks to con- generate linked data and start forming a critical mass of
nect web pages and mainly considers documents without any RDF data about user-created content in the same way as
semantics. RDF allows us to describe information about re- LiveJournal did for FOAF data. Other tools allow users
sources, including real-world objects, and the different types to browse SIOC data or to translate existing data, such as
of relations between them. The Linked Data initiative takes mailing list archives, to SIOC. Moreover, SIOC is also used
this a step further and defines best practices for publishing for enterprise data integration[9], and some popular Web 2.0
linked data on the web [2]: use URIs as names for things; sites such as Seesmic4 start using it to model their data.
use HTTP URIs so that people can look up those names;
when someone looks up a URI, provide useful information; To demonstrate the linked nature of SIOC data created by
include links to other URIs so that people can discover more these sites, let us look at data generated by a SIOC export
related things. plugin for a blog site (e.g. the WordPress SIOC plugin5 ).
It creates a set of RDF documents describing the blog itself
SIOC (Semantically Interlinked Online Communities) [7] is and every post, comment and user on this blog. Note that
a step towards providing linked data from social media sites data instances exported from a larger site (e.g., a bulletin
(forums, blogs, wikis, etc.) and provides a common vocabu- board) are similar, but larger in number and may contain
lary to describe meta-data of such sites in RDF. The project multiple sioc:Forum objects whereas a blog has just one.
An important property of SIOC data is that all the RDF
documents generated by an exporter are interlinked using
rdfs:seeAlso links (Fig. 1):
1
http://groups.google.com/group/sioc-dev
2
http://www.w3.org/Submission/2007/02/
3
http://rdfs.org/sioc/applications/
4
Copyright is held by the author/owner(s). http://seesmic.com
5
LDOW2008, April 22, 2008, Beijing, China. http://sioc-project.org/wordpress
sioc:Site Site
.... .... ....
Posts
.... .... ....
Author
Comments
Figure 1: Structure of linked SIOC data
• a blog profile links to all posts on the blog;
Figure 2: Interlinking SIOC, FOAF and SKOS
• posts link to replies / comments;
• posts and comments link to the profile of a user who linking to SIOC Sites(s). Yet, there are some things that
created them (if a user is registered on a site). can be done to facilitate linking to SIOC:
This makes SIOC-enabled social media sites a rich source of • owners of FOAF profiles can link to social media sites
interlinked data and enables an RDF crawler to start at the and their user accounts on these sites;
main profile of a site and retrieve all SIOC RDF pages that
this site provides from a single entry point, i.e. the RDF file • SIOC exporters can be optimized to make SIOC data
describing the sioc:Site. Almost all SIOC-enabled sites easier to discover;
also have RDF autodiscovery links in their web pages, mak- • Semantic Web indexing and lookup services can find
ing it easier to find RDF data for people and applications and provide access to SIOC data.
when opening a page. Another interesting use of this au-
todiscovery feature is implemented in the Semantic Radar
Firefox extension6 . The extension automatically sends a no- A simple and effective way to link to SIOC profiles is to
tification to the Ping The Semantic Web service7 whenever point to them from personal FOAF profiles. For example,
the user accesses an autodiscovery-enabled web page. This people often already link to their blogs from FOAF using
allows people to participate in the distributed discovery of the foaf:weblog property. In this case they just need to
Semantic Web content [6]. add an rdfs:seeAlso link pointing to the weblog’s SIOC
profile, which also mentions the weblog’s homepage URI and
While these features makes a SIOC-enabled site a good source thus allows a client to connect the information in the two
of linked data, at this point it can still be viewed as a documents.
“walled garden” and not really as an integral part of a larger
Web of Linked Data. On its own, SIOC data from a sin- People can also use the foaf:holdsOnlineAccount property
gle exporter may have a limited connectivity to the “outside to point to user accounts (sioc:User) that they have reg-
world”. Links to other websites are mainly achieved at the istered on online community sites (Fig. 2). As a part of
moment thanks to the extraction of HTML hyperlinks from SIOC data, these user accounts are normally linked to con-
post content. These hyperlinks are republished in the post’s tent items created by the user.
SIOC profile using the sioc:links_to property.
If we can make it easier for applications and users browsing
3. LINKING DATA WITH SIOC the Web to find SIOC data, they are more likely to use this
There are two paths to making online community data, de- data. RDF autodiscovery links play an important role in
scribed in SIOC, a more integrated part of the Web of Data: helping one to find Semantic Web data, and almost all SIOC
linking into “SIOC space”, and referencing linked data from exporters use them, making it a reliable way to detect the
inside online community sites. We will now describe both of presence of SIOC data. As a result, all web pages on a SIOC-
these paths, starting with linking to SIOC data. enabled site contain RDF autodiscovery links to relevant
RDF data (e.g., a blog post will point to data about this
particular post).
3.1 Linking to SIOC Data
The question of receiving links from the “outside world”,
Another way of improving discoverability is content negoti-
strictly speaking, is not something SIOC exporters can di-
ation. The idea is that client requests to URIs of resources
rectly influence as this depends on other sources of data
such as forum posts are answered with either an HTML or
6
http://sioc-project.org/firefox an RDF representation, depending on the client’s prefer-
7
http://pingthesemanticweb.com ences. This lets existing URIs of the forum’s web pages,
which are already used as targets of HTML links, double as • Other data - information associated with / embedded
access points to the SIOC data when used by RDF-aware within content.
clients. We have experimented with adding content nego-
tiation to some SIOC exporters, but this has proven to be
challenging. First, implementing full content negotiation as FOAF is one of the most successful Semantic Web vocabu-
described in the HTTP specification is fairly complex. Sec- laries and is often used to describe personal information and
ond, it is hard to verify that an implementation works cor- social network relations. One of the first use cases for linking
rectly with the large number of different versions of Web to other RDF data is to link posts or comments to FOAF
and data browsers in use on the Web, some of which express profiles of their creators. Currently sioc:Post(s) are linked
questionable preferences. Third, the plugin systems of some to a creator’s user profile on a community site, but export
content management systems do not provide the necessary tools can be easily extended to point to a provided FOAF
hooks for implementing content negotiation. Therefore, we profile and to the URI of a creator of this content.
recommend that linked data browsers such as Tabulator [3]
also make use of RDF autodiscovery information as an al- In order to point to FOAF profiles, the application has to
ternative to content negotiation. know what URI to point to. Users of a community site
(e.g., post authors) can supply this information when reg-
SIOC exporters can also use RDFa8 to embed SIOC data istering on the site, but comment authors may not be mo-
directly in HTML documents, e.g. blog posts. A first ap- tivated enough to provide the necessary FOAF information
proach to embed RDFa in Drupal has been explored9 , and and an automated process for finding this data is prefer-
future work may go in this direction, since it provides an- able. This can be achieved automatically using OpenID10 :
other way to discover RDF triples related to a document. A user authenticates using OpenID and a blog engine checks
an OpenID URI for links to person’s FOAF profile. If such
Finally, some site administrators have decorated their pages a link is found, statements pointing to the person’s FOAF
with small SIOC icons that link directly to the related SIOC profile (e.g., owl:sameAs and rdfs:seeAlso) can be added
profile. These icons can be considered redundant as we al- to SIOC metadata describing the author of the comment.
ready have RDF autodiscovery links, or can even be consid- Some work regarding this point are currently done within
ered harmful as users who click the icon in non-RDF-aware the SparqlPress11 project, which aims to provide Semantic
Web browsers will be subjected to raw RDF/XML code. On Web functionalities, in both exporting, crawling and linking
the other hand, such icons are a useful marketing tool that RDF data.
increases awareness of SIOC in general.
Categories and tags which describe the topic of the content
In general, it can be said that we desire a user experience are also good candidates for pointing to additional informa-
where URIs of Web resources such as blog posts can simply tion. SIOC exporters use a sioc:topic property to point to
be used in HTML hyperlinks and RDF statements to refer a topic URI. Topic category hierarchies can be described in
to the resource. Mechanisms such as RDF autodiscovery, SKOS and topics can be linked to more information about
content negotiation or RDFa parsing should be transparent them such as RDF data from DBpedia [1]. Tag vocabularies
to the user and entirely reliable. such as the Tag Ontology, MOAT (Meaning Of A Tag)12 , or
SCOT [8] can be used to associate more information with
An important aspect of linked data is being able to find tags. However, SIOC exporters need to know what URIs to
what data is linking into a resource. Online community point to for a given tag or category.
sites sometimes notify each other of links they make by us-
ing pingbacks or trackbacks, but other publishers usually do One option is to add this knowledge to the site in advance.
not send such notifications. This is where Semantic Web in- Adding annotations every time a new topic is created may
dexing services such as Sindice [12] play an important role. work for categories that do not change very often, but is not
Such a service can find the incoming RDF statements that feasible for tags which are more dynamic. Another option is
reference a resource. These “backlinks” can be displayed to to rely on the MOAT framework, which allows us to assign
users, either on SIOC-enabled sites or directly in the user meaning to tags in a collaborative way using existing URIs
interface of linked data browsers, thus helping users to nav- from datasets such as DBpedia or GeoNames [11]. For exam-
igate these links in both directions. ple, some users in a community may have indicated that the
sparql tag means http://dbpedia.org/resource/SPARQL.
3.2 Linking From SIOC Data to the Outside When a user tags a post with the sparql tag, the DBpe-
Some resources that users may want to link to include: dia URI will be offered as a possible meaning of the tag,
and can be selected with a single click for exporting as a
sioc:topic link. This provides an efficient way to interlink
• Persons - authors of posts and comments => link to community site posts and DBpedia. Thanks to semantic re-
their FOAF URI and FOAF profile; lationships within DBpedia itself, useful information about
how the topic of different posts relate is only a link away
• Topics - categories and tags => link to other linked (Fig. 3).
data about these topics;
10
http://apassant.net/blog/2007/09/23/
8 retrieving-foaf-profile-from-openid/
http://www.w3.org/TR/xhtml-rdfa-primer/
9 11
http://groups.google.com/group/sioc-dev/browse_ http://wiki.foaf-project.org/SparqlPress
12
thread/thread/b1585e9ef3a17665 http://moat-project.org/
http://tags.moat-
project.org/tag/sparql
tags:name
tags:associatedTag
sparql
http://example.org/ http://dbpedia.org/
tagging/1 moat:tagMeaning
resource/SPARQL
sioc:topic
skos:subject
tags:taggedBy
tags:taggedResource
http://dbpedia.org/resource/
Category:Web_services
http://example.org/alex http://example.org/
post/1
sioc:topic sioc:topic
sioc:has_creator
sioc:has_creator sioc:has_creator
http://myblog.net/post http://something.net/post
http://myblog.net/john http://something.net/uldis
Figure 3: Interlinking blog posts thanks to SIOC, Figure 4: FOAF, SIOC and Data Portability.
MOAT and DBpedia.
related vocabularies [4]. Developers have already created ex-
There are other resources that authors may want to describe porters for such tools and some of them use SIOC to describe
or link to. Some SIOC exporters have built-in functional- user account and content, as in exporters for Twitter14 and
ity for defining links to additional RDF data. The Drupal Flickr[10].
and WordPress exporters pass post content through a filter
to find hyperlinks with MIME type application/rdf+xml. Information from users’ contribution to these sites can be
Appropriate rdfs:seeAlso statements are added to the gen- interesting at different levels. There are social networks,
erated SIOC RDF data for this post in addition to the usual which are formed by users on different sites. These net-
sioc:links_to property. This simple mechanism allows us works can be described in FOAF. Web users may also have
to add links to RDF data to existing blog posts with very a FOAF profile which points to their different accounts on
little effort. As a possible extension of this mechanism, some different sites. Users create content items (sioc:Item) such
information about the linked resources may be copied into as videos, bookmarks, etc. which can be organised in con-
the post’s SIOC profile, e.g. rdfs:label and rdf:type in- tainers (Fig. 4). When expressed in RDF, this informa-
formation which assists users of data browsers in navigation. tion forms an interlinked web of rich social data, ready for
reuse. Moreover, the use of the SIOC types module15 al-
Sometimes, having a link to external RDF data is not enough. low people to decribe exactly the type of their content (eg:
Within a post, a user may want to express some additional sioct:BlogPost, sioct:VideoChannel).
machine-readable information about the objects discussed in
the post. As example use case an author describes a software Two interesting and emerging applications of such informa-
project, points to DOAP (Description Of A Project) data tion are object-centred sociality and social media portabil-
describing this project, and includes a review of this project. ity. Object-centred sociality looks at objects such as content
This information can be added to SIOC data, but we need items which people create and co-annotate as a medium
a way to add additional information to content items and to through which people are connected together. The use of
be able to retrieve it later. SIOC data for object-centred sociality is explored in [5].
Data or social media portability16 is an initiative aimed at
Such embedded annotations are not widespread yet and cur- providing open standards for discovery, import, export and
rent SIOC tools do not implement this functionality, but synchronisation of user profiles, relationships, content and
they can be extended to support such a use case. Filters for media. SIOC and FOAF, combined with domain specific
extracting metadata from post content can be executed one ontologies, allows to describe most of such information and
after another and RDF data can be extracted and added to can form a solution for social media portability in an open
the generated SIOC data. Data can be embedded in content and machine-redeable way (Fig. 4).
in a number of different ways – RDFa, GRDDL13 , microfor-
mats – as long as appropriate content extraction modules
are available.
5. CONCLUSION
SIOC data created by online community sites are highly
interlinked and ready for weaving into a larger Web of Data.
4. USING SOCIAL MEDIA DATA In this paper we described some approaches for facilitating
The Social Web is not limited to forums or blogs. There are the linking to SIOC data and for using SIOC to link back
different kinds of social media and Web 2.0 sites, such as to other RDF data.
Flickr, Twitter and Facebook, which offer interesting con- 14
tent that can be described in RDF using FOAF, SIOC and http://sioc-project.org/node/262
15
http://rdfs.org/sioc/types
13 16
http://www.w3.org/TR/grddl-primer/ http://www.dataportability.org/
While some may consider social media content just an ”end- collaborative approach to bridge the gap between
point” in a journey for exploration of linked data, interesting tagging and Linked Data. Proceedings of the WWW
possiblities arise when these content items are both linked 2008 Workshop Linked Data on the Web
to and contain links to other RDF data. In this case social (LDOW2008), Beijing, China, Apr 2008.
media content and associated SIOC data act as a linking [12] G. Tummarello, R. Delbru, and E. Oren. Sindice.com:
point by connecting together different parts of the linked Weaving the Open Linked Data. Proceedings of the
data universe. International Semantic Web Conference (ISWC
2007), 2007.
The authors are looking forward to feedback and suggestions
from other implementers of Linked Data on the Web for
enabling interoperability and reuse of the SIOC data and
tools described here.
6. REFERENCES
[1] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann,
R. Cyganiak, and Z. G. Ives. DBpedia: A Nucleus for
a Web of Open Data. In ISWC/ASWC, volume 4825
of Lecture Notes in Computer Science, pages 722–735.
Springer, 2007.
[2] T. Berners-Lee. Design Issues–Linked Data. Published
online, May 2007. http://www.w3.org/DesignIssues/
LinkedData.html.
[3] T. Berners-Lee, Y. Chen, L. Chilton, D. Connolly,
R. Dhanaraj, J. Hollenbach, A. Lerer, and D. Sheets.
Tabulator: Exploring and Analyzing linked data on
the Semantic Web. In Proceedings of the The 3rd
International Semantic Web User Interaction
Workshop (SWUI06), Nov 2006.
[4] U. Bojārs, J. Breslin, A. Finn, and S. Decker. Using
the Semantic Web for Linking and Reusing Data
Across Web 2.0 Communities. The Journal of Web
Semantics, Special Issue on the Semantic Web and
Web 2.0 (Forthcoming), 2008.
[5] U. Bojars, B. Heitmann, and E. Oren. A Prototype to
Explore Content and Context on Social Community
Sites. SABRE Conference on Social Semantic Web
(CSSW 2007), 2007.
[6] U. Bojārs, A. Passant, F. Giasson, and J. G. Breslin.
An Architecture to Discover and Query Decentralized
RDF Data. Proceedings of the 3rd Workshop on
Scripting for the Semantic Web (SFSW 2007),
Innsbruck, Austria, Jun 2007.
[7] J. G. Breslin, A. Harth, U. Bojars, and S. Decker.
Towards Semantically-Interlinked Online
Communities. In The 2nd European Semantic Web
Conference (ESWC ’05), Heraklion, Greece,
Proceedings, May 2005.
[8] H. L. Kim, J. G. Breslin, S. K. Yang, and H. G. Kim.
Social Semantic Cloud of Tag: Semantic Model for
Social Tagging. Proceedings of the 2nd KES
International Symposium on Agent and Multi-Agent
Systems: Technologies and Applications
(Forthcoming), Incheon, Korea, 2008.
[9] A. Passant. A Collaborative Semantic Space for
Enterprise. Proceedings of the Knowledge Web PhD
Symposium 2007 (KWEPSY 2007), Innsbruck,
Austria, Jun 2007.
[10] A. Passant. :me owl:sameAs flickr:33669349@N00.
Proceedings of the WWW 2008 Workshop Linked Data
on the Web (LDOW2008), Beijing, China, Apr 2008.
Demo presentation.
[11] A. Passant and P. Laublet. Meaning Of A Tag: A