<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Real-time #SemanticWeb in &lt;= 140 chars</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Joshua Shinavier Tetherless World Constellation Rensselaer Polytechnic Institute Troy</institution>
          ,
          <addr-line>NY 12180</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>26</fpage>
      <lpage>30</lpage>
      <abstract>
        <p>Instant publication, along with the prospect of widespread geotagging in microblog posts, presents new opportunities for real-time and location-based services. Much as these services are changing the nature of search on the World Wide Web, so the Semantic Web is certain to be both challenged and enhanced by real-time content. This paper introduces a semantic data aggregator which brings together a collection of compact formats for structured microblog content with Semantic Web vocabularies and best practices in order to augment the Semantic Web with real-time, user-driven data. Emerging formats, modeling issues, publication and data ownership of microblogging content, and basic techniques for real-time, real-place semantic search are discussed.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Compared to the World Wide Web, the Semantic Web is
lacking in user-driven content. Large, user-curated data sets
such as those of DBPedia1 and Freebase2 notwithstanding,
recent analyses [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] of the Linked Data cloud indicate that
it does not exhibit the power-law distributions or strong
connectivity typical of naturally-evolving networks.
      </p>
      <p>Meanwhile, Web 2.0 services channel large amounts of
potentially valuable user-driven data every minute. Semantic
wikis and the Microformats3 community aim to bridge this
gap by enabling users to add small amounts of semantic data
to their content, while much of the work on semantic
microblogging thus far focuses on representing users, microblogs and
microblog posts in the Semantic Web: essentially, on doing
for microblogs what SIOC4 has done for blogs. The work
described in this paper takes the complementary approach
of harvesting semantic data embedded in the content of
microblog posts, or of doing for microblogs what microformats
do for Web pages. Its contribution to the emerging semantic
microblogging ecosystem includes:
a set of syntax conventions for embedding various
structured content in microblog posts
an information schema for user-driven data and
associated metadata
1http://dbpedia.org/
2http://www.freebase.com/
3http://microformats.org/
4http://rdfs.org/sioc/spec/
a technique for translating microblog streams into RDF
streams in real-time
a way of publishing user-driven data in the web of
Linked Data while being fair to microblog authors
an open-source semantic data aggregator, called
TwitLogic, which implements the above ideas
In addition, a simple technique for scoring microblog content
based on recency and proximity is presented.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>NANOFORMATS FOR THE SEMANTIC WEB</title>
      <p>A number of compact formats, variously called
nanoformats5, picoformats,6 or microsyntax,7 have been proposed to
allow users to express structured content or issue
servicespecific commands in microblog posts. Examples in widespread
use include @usernames for addressing or mentioning a
particular user, and #hashtags for generic concepts. So-called
triple tags even allow the expression of something like an RDF
triple. These formats are subject to a tradeo between
simplicity and expressivity which heavily impacts community
uptake.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Twitter Data</title>
      <p>
        Twitter Data [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is an open proposal for embedding
structured data in Twitter messages. According to its FAQ, “the
purpose of Twitter Data is to enable community-driven
efforts to arrive at conventions for common pieces of data that
are embeddable in Twitter by formal means”. To kick-start
this process, Twitter Data introduces a concrete syntax based
on key/value pairs. For instance,
      </p>
      <sec id="sec-3-1">
        <title>I love the #twitterdata proposal! $vote +1</title>
        <p>RDF-like triples are possible using explicit subjects:
2.2</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>MicroTurtle</title>
      <p>MicroTurtle8 is a particularly compact serialization format
for RDF, capable of embedding general-purpose RDF data in
5http://microformats.org/wiki/
microblogging-nanoformats
6http://microformats.org/wiki/picoformats
7http://www.microsyntax.org/
8http://buzzword.org.uk/2009/microturtle/spec
microblog posts. It makes use of hard-coded CURIE 9 prefixes
as well as keywords for terms in common vocabularies such
as FOAF,10 Dublin Core,11 and OpenVocab.12 For example:</p>
      <sec id="sec-4-1">
        <title>Wow! Great band! #mttl #music &lt;#me&gt;</title>
        <p>r [ !&lt;http://theholdsteady.com/&gt; #altrock ] .</p>
        <p>This expresses, in a named graph tagged “music”, that the
author of the post likes a band, tagged ”altrock”, with the
given homepage.
2.3</p>
        <p>smesher</p>
        <p>Smesher13 is a semantic microblogging platform which
collects structured data from microblog posts in a local RDF
store. Its syntax includes key/value pairs similar to Twitter
Data’s which are readily translated into RDF statements:</p>
        <p>RT @sue: I can #o er a #ride to the #mbc09 #from=Berlin
#to=Hamburg
Smesher users can query their data using SPARQL and
filter it to create customized data streams.
2.4</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>TwitLogic</title>
      <p>TwitLogic currently supports a near-natural-language
format which is intended to be particularly memorable and
unobtrusive. Structured content is expressed by
annotating a user name, hashtag or URL with a parenthetical
“afterthought” resembling a relative clause. For example:</p>
      <p>Great convo with @lidingpku (creator of #swoogle) about
ranking algos.</p>
      <p>This approach makes the assumption that hashtags such as
#swoogle are semantically stronger than “ordinary” tags, in
that microblog users who really want to refer their readers
to a specific concept tend to avoid ambiguous tags. Ideally,
the syntax should be natural enough so as not to distract the
reader, yet contrived enough to minimize false positives:
#sioclog (see http://bit.ly/2uAWo2) makes Linked Data from
IRC logs.</p>
      <p>Some afterthoughts are represented with more than one RDF
statement. For example, the following produces a minimal
review of a movie in terms of the RDF Review Vocabulary:14</p>
      <p>Who would have guessed such a funny movie as
#Zombieland (3/4) could be made around zombies?
There are several other “review” formats, such as
LouderTweets,15 which could be handled in exactly the same way;
all that is needed is an appropriate parser for the format.</p>
      <p>TwitLogic will eventually take advantage of some of the
other formats described above, as well as pre-existing
con9http://www.w3.org/TR/curie/
10http://xmlns.com/foaf/spec/
11http://dublincore.org/documents/dcmi-terms/
12http://open.vocab.org/terms/
13http://smesher.org/
14http://vocab.org/review/terms
15http://www.loudervoice.com/2007/06/13/
loudervoice-twitter-mash-up/
ventions such as “tag++”. Probably the best approach to the
chicken-and-egg problem of semantic nanoformats is to
promote and build tools to support a variety of formats, see what
“sticks”, and then take steps to keep up with any
communitydriven conventions which may arise.
3.</p>
    </sec>
    <sec id="sec-6">
      <title>A USER-DRIVEN SEMANTIC WEB KNOWL</title>
    </sec>
    <sec id="sec-7">
      <title>EDGE BASE</title>
      <p>
        There are several kinds of structured content which can be
gathered from a microblogging service such as Twitter:
1. authoritative information about microblogging accounts
and the people who hold them. SemanticTweet16 is an
example of a service which publishes the social network
information provided by Twitter on the Semantic Web.
2. authoritative information about microblog feeds and
individual posts. The SMOB semantic microblogging
system [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], for one, represents microblog content at this
level.
3. user-created information embedded in the text of a
microblog post. Gathering such user-driven “statements
about the world” and using them to populate the
Semantic Web is the main goal of TwitLogic. People,
accounts, and microblog posts are included in the
knowledge base only as contextual metadata to enhance
information discovery and provide author attribution for
the embedded data.
16http://semantictweet.com/
3.1
      </p>
    </sec>
    <sec id="sec-8">
      <title>Representing microblog content in RDF</title>
      <p>The schema used for TwitLogic’s knowledge base draws
upon a discussion17 on the Semantic Web mailing list about
RDF vocabularies for modeling microblogging resources. In
particular, it makes use of a collection of terms from the FOAF,
SIOC, Dublin Core, Named Graphs18 and Basic Geo19
vocabularies.</p>
      <p>
        The sioc:embedsKnowledge property, which has been
proposed in connection with UfoWiki [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], serves to associate a
microblog post with any structured data that has been
extracted from it, in the form of a named graph containing the
extracted RDF statements. This link not only provides source
metadata for those statements, but it also connects them with
a timestamp and, potentially, a placestamp which are useful in
searching and filtering. The use of geo:location as depicted
is convenient, although its domain of geo:SpatialThing
arguably does not include microblog posts.
3.2
      </p>
    </sec>
    <sec id="sec-9">
      <title>Publishing the knowledge base as Linked</title>
    </sec>
    <sec id="sec-10">
      <title>Data</title>
      <p>
        From the moment it is added to the TwitLogic knowledge
base, the embedded data and contextual metadata of a
microblog post are made available in accordance with best
practices for publishing Linked Data on the Web [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Also
available are a voiD [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] description of the data set as a whole,
periodically generated RDF dumps of the data set, and owl:sameAs
links into related data sets (currently, SemanticTweet’s). In
order to serve all of the information about a resource against
the same dereferenceable URI (regardless of how many
userdriven named graphs the resource is described in), TwitLogic
provides unconventional TriX and TriG serializations20 of the
data alongside the more common RDF formats.
3.3
      </p>
    </sec>
    <sec id="sec-11">
      <title>Data ownership</title>
      <p>According to Twitter’s terms of service,21 “you own your
own content”, although that content may be freely copied,
modified, and redistributed by Twitter and its partners. The
data model described above supports authors’ rights by
providing attribution metadata for all user-driven content: the
text of a microblog post is always associated with its author,
and the RDF statements from embedded data in a post are
always contained in a named graph which is associated with
that post. Although TwitLogic does not rdfize every
microblog post that passes through the system, it does maintain
an RDF description of every tweet from which it has extracted
content, so that attribution metadata is guaranteed,
independently of the availability of the Linking Open Data22 data sets
into which TwitLogic links.</p>
      <p>
        On account of its diverse authorship, the TwitLogic
knowledge base as a whole is published under the Open Data
Commons [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] Public Domain Dedication and License (PDDL).
17http://lists.w3.org/Archives/Public/semantic-web/
2009Sep/0174.html
18http://www.w3.org/2004/03/trix/
19http://www.w3.org/2003/01/geo/
20http://wiki.github.com/joshsh/twitlogic/
twitlogic-linked-data
21http://twitter.com/tos
22http://esw.w3.org/topic/SweoIG/TaskForces/
CommunityProjects/LinkingOpenData/
      </p>
    </sec>
    <sec id="sec-12">
      <title>ARCHITECTURE AND IMPLEMENTA</title>
    </sec>
    <sec id="sec-13">
      <title>TION</title>
      <p>The essential components of TwitLogic are an HTTP client,
a collection of nanoformat parsers, and an rdfizer
component which translates microblogging artifacts and
annotations from TwitLogic’s internal object model to equivalent
RDF representations. The HTTP client maintains a
connection with Twitter’s streaming API,23 receiving status updates,
or JSON-formatted tweets, as they become available. When
the client receives a tweet, it passes it into a parser which
creates an intermediate Java object for the tweet, making it
easier to work with at the lower levels. This tweet object is
then passed into the top level of a hierarchy of parsers which
combine BNF grammars with procedural code to match
expressions in any of the supported nanoformats. If the tweet is
successfully matched by the parser for a specific syntax
convention, the parser will attach an annotation in RDF which
the tweet carries with it into the rdfizer. The rdfizer, in turn,
mints a URI for the graph into which it places the annotation
and maps the tweet and the rest of the metadata into RDF
according to the schema in Figure 1.</p>
      <p>At this point, the tweet passes into an RDF stream and from
there into a built-in Sesame triple store. This triple store is
exposed via a SPARQL endpoint as well as the Linked Data
server. The application is configurable with respect to the
underlying triple store and to the set of users and Twitter
Lists it “listens to” through Twitter’s rate-limited API.</p>
      <p>The open-source TwitLogic implementation 24 is available
online. The TwitLogic home page can be found at:
http://twitlogic.fortytwo.net/
23http://apiwiki.twitter.com/
Streaming-API-Documentation
24http://github.com/joshsh/twitlogic</p>
      <p>A Linked Data mashup25 featuring TwitLogic was deployed
at the 8th International Semantic Web Conference.26 The
mashup, called Linking Open Conference Tweets, gathered
statistics about conference-related Twitter posts to generate
Google visualizations, at regular intervals, as the conference
proceeded. A tweet was considered to be conference-related
if it contained the hashtag, #iswc2009, of the conference, or
the hashtag of any of the subevents of the conference, such
as #sdow2009. This goes beyond the search functionality
provided by the Twitter API. The #iswc2009 hashtag was
used as a starting point for the statistics, but the application
had no other built-in knowledge of the conference. Instead,
it executed SPARQL queries over data provided by
TwitLogic and the Semantic Web Conference Corpus27 to discover
sub-events on-the-fly. The Conference Corpus provided the
subevent-superevent relationships, while a few individuals
very deliberately tweeted TwitLogic-formatted owl:sameAs
statements to link hashtags to Conference Corpus resources.
Two dozen structured tweets were far more than enough to
categorize hundreds of tweets by individuals with no
knowledge of the syntax, demonstrating that “a little semantics goes
a long way”.</p>
      <p>The author plans to make an enhanced version of the
Linking Open Conference Tweets service available for the 2010
World Wide Web Conference.28</p>
    </sec>
    <sec id="sec-14">
      <title>TOWARDS REAL-TIME, REAL-PLACE</title>
    </sec>
    <sec id="sec-15">
      <title>SEMANTIC SEARCH</title>
      <p>Apart from collecting and publishing user-driven
semantic data, we would also like to search and reason on it The
mashup described above uses SPARQL to query over the
data, and other Semantic Web query and reasoning
techniques are equally possible. In this environment, however,
every user-driven statement is associated with a time-stamped
and potentially29 place-stamped microblog entry. We would
like to take advantage of this metadata in order to score search
25http://tw.rpi.edu/portal/Linking_Open_Conference_
Tweets
26http://iswc2009.semanticweb.org/
27http://data.semanticweb.org/
28http://www2010.org/
29http://blog.twitter.com/2010/03/
results based on nearness in time and location to the context
of the query.</p>
      <p>At least for some types of query, a simple but e ective
approach is to keep a record of the influence of a named graph on
intermediate results during query execution and to combine
this with a measure of the significance of the graph, in terms
of space and time, to produce a score for each query result.
Overall significance of a graph is computed as the product
of its significance in time and space. To illustrate, when the
(abbreviated) SPARQL query
SELECT ?w WHERE { ?w rdf:type hashtag:workshop }
is evaluated against a knowledge base containing the data
in Figure 1, the triple pattern will match one of the
embedded statements which were tweeted in Chantilly, Virginia at a
certain time in October, 2009. Since there’s only one triple
pattern, the influence on the corresponding result w of the graph
containing that statement is 1, whereas the significance of the
graph depends on the query context. One might imagine a
SPARQL engine augmented with an appropriate system of
provenance-aware bindings, such that a user in Chantilly at
the time of the workshop would find hashtag:sdow2009 first
in a list of scored solutions, whereas a user in Germany, or
the following day in Chantilly, would find other workshops
first, due to the di erent significance of graphs in di erent
contexts. No such query engine has been designed or built,
although current work in progress uses the significance
function in an analogous manner to control activation-spreading.
5.1</p>
    </sec>
    <sec id="sec-16">
      <title>Time-based significance</title>
      <p>Clearly, the significance of a graph decreases over time in
a query environment which favors recency. That is, there is
an inverse relationship between the significance of a graph
and the positive di erence between the current (or reference)
time and the timestamp of the graph. Specifically, the
timebased significance, Stime, of a statement should have a value
of 1 when there is no di erence in time, and should approach
a designated baseline value, btime, as the di erence goes to
infinity. A modified exponential decay function has this
behavior:
t
Stime(t) = btime + (1 btime) 2 th ;
(1)
where th is the amount of time it takes for the significance of
a statement to drop to half of its original value, disregarding
the influence of btime.</p>
      <p>The resulting preference for the most recently acquired
information is fitting for microblogging environments, in which
it is generally not possible to “take back” statements which
have been made in the past:30 instead, one simply makes new
statements.</p>
      <p>The constant btime should be given a value greater than zero
if it is not desirable for old statements to “disappear” entirely.
For example, a statement of a person’s gender is usually as
valid years from now as it is today. However, as new
information becomes available, it will tend to supplant older
information. This “freshness” is particularly advantageous
for properties such as foaf:based near whose value is subject
to frequent change.
whats-happeningand-where.html
30Some microblogging services, including Twitter, do allow
users to delete their posts. Nonetheless, once a post is “out
there”, it is potentially out there for good, both in computer
systems and human memory.</p>
      <p>Our requirements for location-based significance are the
same as those for time-based significance, except that
geodistance varies over a finite interval, whereas time-distance
varies over an infinite one. Sloc should have a value of 1 at the
current (or reference) location, and a baseline value of bloc at
the maximum distance dmax:</p>
      <p>Sloc(d) = 1 + (bloc 1)
(2)
dmax
where d is the great-circle distance from the reference location.
Note that only distance, as opposed to actual position on the
global map, is considered here.
5.3</p>
    </sec>
    <sec id="sec-17">
      <title>Closing the world of time and space</title>
      <p>It is necessary to impose a baseline significance of btime or
bloc on graphs with no time or place metadata. Equivalently,
one can think of such graphs as occupying a time and place
long, long ago or at the other end of the world. All of the
data from the rest of the Semantic Web which we might like
to search and reason on, resides here. For example, if btime
and bloc are both 1=4, then static data will not fall to less than
1=16th the significance of any new data.
5.4</p>
    </sec>
    <sec id="sec-18">
      <title>What is real-time?</title>
      <p>
        TwitLogic provides real-time queries in that it enables
querying on real-time data: a few milliseconds after a microblog
post is received from Twitter’s streaming API, it is ready to
participate in query answering. The data loses significance
over time, but it remains available for all future queries
unless it is removed by external means. This approach is distinct
from continuous query techniques, such as C-SPARQL [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], in
which a query is first defined, then allowed to match data as
it becomes available. On the other hand, TwitLogic produces
an RDF stream which C-SPARQL could query natively, given
a suitable transport protocol.
      </p>
    </sec>
    <sec id="sec-19">
      <title>WORK IN PROGRESS, AND FUTURE</title>
    </sec>
    <sec id="sec-20">
      <title>WORK</title>
      <p>The dashed lines in Figure 2 indicate TwitLogic
components currently under development. Apart from merely adding
to the triple store, the RDF stream is additionally
broadcast by an XMPP client using the Publish-Subscribe 31
extension, as has been done in other contexts.32 An
AllegroGraph33 instance subscribes to the stream in order to execute
time- and location-based queries using the scoring system
described above. Thus decoupling the query environment
from the aggregator not only allows it to take advantage of
AllegroGraph’s built-in geotemporal features; this is also a
step towards a more modular architecture in which a service
like TwitLogic merely produces an RDF stream in a
wellunderstood way, for consumption by external services.</p>
      <p>For now, the relative newness of this domain means that
TwitLogic has to play a number of roles in order to enable its
data to be put to a variety of uses.</p>
      <p>As an incentive to use semantic nanoformats e ectively,
Twitter users will be able to access the query API by tweeting
at the @twit logic user, which responds to correctly-formatted
queries with a bit.ly URL for a query results page. In order to
get better results, it will be in users’ best interest to tweet
relevant information. Since tweet-based queries are answered in
real-time and contain a placestamp for those users who have
opted into Twitter’s geolocation functionality, no extra syntax
is required to make queries time- and location-sensitive.</p>
      <p>Apart from more and more real-time Semantic Web mashups,
possibilities for future work with TwitLogic include
continuous queries, support for a greater variety of semantic
nanoformats, and a trust-based significance factor for query
evaluation.
7.</p>
    </sec>
    <sec id="sec-21">
      <title>ACKNOWLEDGEMENTS</title>
      <p>This project has been supported by Franz Inc. as well as
RPI’s Tetherless World Constellation. Special thanks go to
Jans Aasman, James A. Hendler, Deborah L. McGuinness
and Steve Haflich for their positive influence on the concept
and implementation of TwitLogic, to Li Ding and Zhenning
Shangguan for their contribution to the Linking Open
Conference Tweets mashup, and to Jie Bao, Tom Heath, Marko
A. Rodriguez, Alvaro Graves, Gregory Todd Williams, Jesse
Weaver and Xixi Luo for their helpful comments and
feedback.
8.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Alexander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          .
          <article-title>Describing linked datasets: On the design and usage of voiD, the “Vocabulary of Interlinked Datasets”</article-title>
          .
          <source>In 2nd International Workshop on Linked Data on the Web</source>
          , Madrid, Spain,
          <year>April 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Barbieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Braga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ceri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Della</given-names>
            <surname>Valle</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Grossniklaus</surname>
          </string-name>
          .
          <article-title>C-SPARQL: SPARQL for continuous querying</article-title>
          .
          <source>In Proceedings of the 18th international conference on World Wide Web</source>
          , pages
          <fpage>1061</fpage>
          -
          <lpage>1062</lpage>
          , Madrid, Spain,
          <year>April 2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          .
          <article-title>How to publish Linked Data on the Web</article-title>
          . http://www4.wiwiss.fuberlin.de/bizer/pub/LinkedDataTutorial/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Fast</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Kopsa</surname>
          </string-name>
          .
          <article-title>Twitter Data - a simple, open proposal for embedding data in Twitter messages</article-title>
          . http://twitterdata.org/, May
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Halpin</surname>
          </string-name>
          .
          <article-title>A query-driven characterization of Linked Data</article-title>
          .
          <source>In 2nd International Workshop on Linked Data on the Web</source>
          , Madrid, Spain,
          <year>April 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Styles</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          .
          <article-title>Open Data Commons, a license for open data</article-title>
          .
          <source>In 1st International Workshop on Linked Data on the Web</source>
          , Beijing, China,
          <year>April 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Passant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastrup</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Bojars</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Breslin</surname>
          </string-name>
          .
          <article-title>Microblogging: a Semantic Web and distributed approach</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Scripting for the Semantic Web</source>
          ,
          <year>June 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Passant</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Laublet</surname>
          </string-name>
          .
          <article-title>Towards an interlinked semantic wiki farm</article-title>
          .
          <source>In 3rd Semantic Wiki Workshop</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          .
          <article-title>A graph analysis of the Linked Data cloud</article-title>
          .
          <source>KRS-2009-01</source>
          ,
          <year>February 2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>